* [PATCH v2 1/22] fsverity: report validation errors back to the filesystem
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
@ 2026-01-12 14:49 ` Darrick J. Wong
2026-01-13 1:29 ` Darrick J. Wong
2026-01-12 14:49 ` [PATCH v2 2/22] fsverity: expose ensure_fsverity_info() Andrey Albershteyn
` (21 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 14:49 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Provide a new function call so that validation errors can be reported
back to the filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/verity/verify.c | 4 ++++
include/linux/fsverity.h | 14 ++++++++++++++
include/trace/events/fsverity.h | 19 +++++++++++++++++++
3 files changed, 37 insertions(+), 0 deletions(-)
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 47a66f088f..ef411cf5d8 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -271,6 +271,10 @@
data_pos, level - 1, params->hash_alg->name, hsize, want_hash,
params->hash_alg->name, hsize,
level == 0 ? dblock->real_hash : real_hash);
+ trace_fsverity_file_corrupt(inode, data_pos, params->block_size);
+ if (inode->i_sb->s_vop->file_corrupt)
+ inode->i_sb->s_vop->file_corrupt(inode, data_pos,
+ params->block_size);
error:
for (; level > 0; level--) {
kunmap_local(hblocks[level - 1].addr);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 5bc7280425..b75e232890 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -128,6 +128,20 @@
*/
int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
u64 pos, unsigned int size);
+
+ /**
+ * Notify the filesystem that file data is corrupt.
+ *
+ * @inode: the inode being validated
+ * @pos: the file position of the invalid data
+ * @len: the length of the invalid data
+ *
+ * This function is called when fs-verity detects that a portion of a
+ * file's data is inconsistent with the Merkle tree, or a Merkle tree
+ * block needed to validate the data is inconsistent with the level
+ * above it.
+ */
+ void (*file_corrupt)(struct inode *inode, loff_t pos, size_t len);
};
#ifdef CONFIG_FS_VERITY
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
index dab220884b..375fdddac6 100644
--- a/include/trace/events/fsverity.h
+++ b/include/trace/events/fsverity.h
@@ -137,6 +137,25 @@
__entry->hidx)
);
+TRACE_EVENT(fsverity_file_corrupt,
+ TP_PROTO(const struct inode *inode, loff_t pos, size_t len),
+ TP_ARGS(inode, pos, len),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(loff_t, pos)
+ __field(size_t, len)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->pos = pos;
+ __entry->len = len;
+ ),
+ TP_printk("ino %lu pos %llu len %zu",
+ (unsigned long) __entry->ino,
+ __entry->pos,
+ __entry->len)
+);
+
#endif /* _TRACE_FSVERITY_H */
/* This part must be outside protection */
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 1/22] fsverity: report validation errors back to the filesystem
2026-01-12 14:49 ` [PATCH v2 1/22] fsverity: report validation errors back to the filesystem Darrick J. Wong
@ 2026-01-13 1:29 ` Darrick J. Wong
2026-01-13 8:09 ` Christoph Hellwig
2026-01-13 10:27 ` Andrey Albershteyn
0 siblings, 2 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-13 1:29 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
> To: "Darrick J. Wong" <aalbersh@redhat.com>
Say ^^^^^^^ what?
On Mon, Jan 12, 2026 at 03:49:50PM +0100, Darrick J. Wong wrote:
> Provide a new function call so that validation errors can be reported
> back to the filesystem.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/verity/verify.c | 4 ++++
> include/linux/fsverity.h | 14 ++++++++++++++
> include/trace/events/fsverity.h | 19 +++++++++++++++++++
> 3 files changed, 37 insertions(+), 0 deletions(-)
>
> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> index 47a66f088f..ef411cf5d8 100644
> --- a/fs/verity/verify.c
> +++ b/fs/verity/verify.c
> @@ -271,6 +271,10 @@
> data_pos, level - 1, params->hash_alg->name, hsize, want_hash,
> params->hash_alg->name, hsize,
> level == 0 ? dblock->real_hash : real_hash);
> + trace_fsverity_file_corrupt(inode, data_pos, params->block_size);
> + if (inode->i_sb->s_vop->file_corrupt)
> + inode->i_sb->s_vop->file_corrupt(inode, data_pos,
> + params->block_size);
If fserror_report[1] gets merged before this series, I think we should
add a new FSERR_ type and call fserror_report instead.
https://lore.kernel.org/linux-fsdevel/176826402610.3490369.4378391061533403171.stgit@frogsfrogsfrogs/T/#u
--D
> error:
> for (; level > 0; level--) {
> kunmap_local(hblocks[level - 1].addr);
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index 5bc7280425..b75e232890 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -128,6 +128,20 @@
> */
> int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
> u64 pos, unsigned int size);
> +
> + /**
> + * Notify the filesystem that file data is corrupt.
> + *
> + * @inode: the inode being validated
> + * @pos: the file position of the invalid data
> + * @len: the length of the invalid data
> + *
> + * This function is called when fs-verity detects that a portion of a
> + * file's data is inconsistent with the Merkle tree, or a Merkle tree
> + * block needed to validate the data is inconsistent with the level
> + * above it.
> + */
> + void (*file_corrupt)(struct inode *inode, loff_t pos, size_t len);
> };
>
> #ifdef CONFIG_FS_VERITY
> diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
> index dab220884b..375fdddac6 100644
> --- a/include/trace/events/fsverity.h
> +++ b/include/trace/events/fsverity.h
> @@ -137,6 +137,25 @@
> __entry->hidx)
> );
>
> +TRACE_EVENT(fsverity_file_corrupt,
> + TP_PROTO(const struct inode *inode, loff_t pos, size_t len),
> + TP_ARGS(inode, pos, len),
> + TP_STRUCT__entry(
> + __field(ino_t, ino)
> + __field(loff_t, pos)
> + __field(size_t, len)
> + ),
> + TP_fast_assign(
> + __entry->ino = inode->i_ino;
> + __entry->pos = pos;
> + __entry->len = len;
> + ),
> + TP_printk("ino %lu pos %llu len %zu",
> + (unsigned long) __entry->ino,
> + __entry->pos,
> + __entry->len)
> +);
> +
> #endif /* _TRACE_FSVERITY_H */
>
> /* This part must be outside protection */
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 1/22] fsverity: report validation errors back to the filesystem
2026-01-13 1:29 ` Darrick J. Wong
@ 2026-01-13 8:09 ` Christoph Hellwig
2026-01-13 10:27 ` Andrey Albershteyn
1 sibling, 0 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:09 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Darrick J. Wong, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Mon, Jan 12, 2026 at 05:29:11PM -0800, Darrick J. Wong wrote:
> > To: "Darrick J. Wong" <aalbersh@redhat.com>
>
> Say ^^^^^^^ what?
Heh. Looks like somehow your patches being resent all got messed up.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 1/22] fsverity: report validation errors back to the filesystem
2026-01-13 1:29 ` Darrick J. Wong
2026-01-13 8:09 ` Christoph Hellwig
@ 2026-01-13 10:27 ` Andrey Albershteyn
2026-01-13 17:52 ` Darrick J. Wong
1 sibling, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 10:27 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On 2026-01-12 17:29:11, Darrick J. Wong wrote:
> > To: "Darrick J. Wong" <aalbersh@redhat.com>
>
> Say ^^^^^^^ what?
oh damn, sorry, that's my broken script
>
> On Mon, Jan 12, 2026 at 03:49:50PM +0100, Darrick J. Wong wrote:
> > Provide a new function call so that validation errors can be reported
> > back to the filesystem.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/verity/verify.c | 4 ++++
> > include/linux/fsverity.h | 14 ++++++++++++++
> > include/trace/events/fsverity.h | 19 +++++++++++++++++++
> > 3 files changed, 37 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> > index 47a66f088f..ef411cf5d8 100644
> > --- a/fs/verity/verify.c
> > +++ b/fs/verity/verify.c
> > @@ -271,6 +271,10 @@
> > data_pos, level - 1, params->hash_alg->name, hsize, want_hash,
> > params->hash_alg->name, hsize,
> > level == 0 ? dblock->real_hash : real_hash);
> > + trace_fsverity_file_corrupt(inode, data_pos, params->block_size);
> > + if (inode->i_sb->s_vop->file_corrupt)
> > + inode->i_sb->s_vop->file_corrupt(inode, data_pos,
> > + params->block_size);
>
> If fserror_report[1] gets merged before this series, I think we should
> add a new FSERR_ type and call fserror_report instead.
>
> https://lore.kernel.org/linux-fsdevel/176826402610.3490369.4378391061533403171.stgit@frogsfrogsfrogs/T/#u
I see, will health monitoring also use these events? I mean if XFS's
fsverity need to report corrupting error then
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 1/22] fsverity: report validation errors back to the filesystem
2026-01-13 10:27 ` Andrey Albershteyn
@ 2026-01-13 17:52 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-13 17:52 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Tue, Jan 13, 2026 at 11:27:49AM +0100, Andrey Albershteyn wrote:
> On 2026-01-12 17:29:11, Darrick J. Wong wrote:
> > > To: "Darrick J. Wong" <aalbersh@redhat.com>
> >
> > Say ^^^^^^^ what?
>
> oh damn, sorry, that's my broken script
>
> >
> > On Mon, Jan 12, 2026 at 03:49:50PM +0100, Darrick J. Wong wrote:
> > > Provide a new function call so that validation errors can be reported
> > > back to the filesystem.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/verity/verify.c | 4 ++++
> > > include/linux/fsverity.h | 14 ++++++++++++++
> > > include/trace/events/fsverity.h | 19 +++++++++++++++++++
> > > 3 files changed, 37 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> > > index 47a66f088f..ef411cf5d8 100644
> > > --- a/fs/verity/verify.c
> > > +++ b/fs/verity/verify.c
> > > @@ -271,6 +271,10 @@
> > > data_pos, level - 1, params->hash_alg->name, hsize, want_hash,
> > > params->hash_alg->name, hsize,
> > > level == 0 ? dblock->real_hash : real_hash);
> > > + trace_fsverity_file_corrupt(inode, data_pos, params->block_size);
> > > + if (inode->i_sb->s_vop->file_corrupt)
> > > + inode->i_sb->s_vop->file_corrupt(inode, data_pos,
> > > + params->block_size);
> >
> > If fserror_report[1] gets merged before this series, I think we should
> > add a new FSERR_ type and call fserror_report instead.
> >
> > https://lore.kernel.org/linux-fsdevel/176826402610.3490369.4378391061533403171.stgit@frogsfrogsfrogs/T/#u
>
> I see, will health monitoring also use these events? I mean if XFS's
> fsverity need to report corrupting error then
Yes, the stuff you put in xfs' ->file_corrupt function would get moved
to xfs' ->report_error function. Something like:
static void
xfs_fs_report_error(
const struct fserror_event *event)
{
if (event->inode && event->type == FSERR_VERITY &&
IS_VERITY(event->inode))
xfs_inode_mark_sick(XFS_I(event->inode),
XFS_SICK_INO_DATA);
/* healthmon already knows about non-inode and metadata errors */
if (event->inode && event->type != FSERR_METADATA)
xfs_healthmon_report_file_ioerror(XFS_I(event->inode), event);
}
static const struct super_operations xfs_super_operations = {
<snip>
.report_error = xfs_fs_report_error,
};
--D
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 2/22] fsverity: expose ensure_fsverity_info()
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
2026-01-12 14:49 ` [PATCH v2 1/22] fsverity: report validation errors back to the filesystem Darrick J. Wong
@ 2026-01-12 14:49 ` Andrey Albershteyn
2026-01-12 22:05 ` Darrick J. Wong
2026-01-12 14:50 ` [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF Andrey Albershteyn
` (20 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:49 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
This function will be used by XFS's scrub to force fsverity activation,
therefore, to read fsverity context.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/verity/open.c | 4 ++--
include/linux/fsverity.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/verity/open.c b/fs/verity/open.c
index 77b1c977af..956ddaffbf 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -350,7 +350,7 @@
return 0;
}
-static int ensure_verity_info(struct inode *inode)
+int fsverity_ensure_verity_info(struct inode *inode)
{
struct fsverity_info *vi = fsverity_get_info(inode);
struct fsverity_descriptor *desc;
@@ -380,7 +380,7 @@
{
if (filp->f_mode & FMODE_WRITE)
return -EPERM;
- return ensure_verity_info(inode);
+ return fsverity_ensure_verity_info(inode);
}
EXPORT_SYMBOL_GPL(__fsverity_file_open);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index b75e232890..16a6e51705 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -196,6 +196,8 @@
int __fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
void __fsverity_cleanup_inode(struct inode *inode);
+int fsverity_ensure_verity_info(struct inode *inode);
+
/**
* fsverity_cleanup_inode() - free the inode's verity info, if present
* @inode: an inode being evicted
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 2/22] fsverity: expose ensure_fsverity_info()
2026-01-12 14:49 ` [PATCH v2 2/22] fsverity: expose ensure_fsverity_info() Andrey Albershteyn
@ 2026-01-12 22:05 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:05 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:49:57PM +0100, Andrey Albershteyn wrote:
> This function will be used by XFS's scrub to force fsverity activation,
> therefore, to read fsverity context.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/verity/open.c | 4 ++--
> include/linux/fsverity.h | 2 ++
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/verity/open.c b/fs/verity/open.c
> index 77b1c977af..956ddaffbf 100644
> --- a/fs/verity/open.c
> +++ b/fs/verity/open.c
> @@ -350,7 +350,7 @@
> return 0;
> }
>
> -static int ensure_verity_info(struct inode *inode)
> +int fsverity_ensure_verity_info(struct inode *inode)
Looks mostly fine to me, but does fsverity_ensure_verity_info needs its
own EXPORT_SYMBOL_GPL() for when xfs is built as an external module?
--D
> {
> struct fsverity_info *vi = fsverity_get_info(inode);
> struct fsverity_descriptor *desc;
> @@ -380,7 +380,7 @@
> {
> if (filp->f_mode & FMODE_WRITE)
> return -EPERM;
> - return ensure_verity_info(inode);
> + return fsverity_ensure_verity_info(inode);
> }
> EXPORT_SYMBOL_GPL(__fsverity_file_open);
>
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index b75e232890..16a6e51705 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -196,6 +196,8 @@
> int __fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
> void __fsverity_cleanup_inode(struct inode *inode);
>
> +int fsverity_ensure_verity_info(struct inode *inode);
> +
> /**
> * fsverity_cleanup_inode() - free the inode's verity info, if present
> * @inode: an inode being evicted
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
2026-01-12 14:49 ` [PATCH v2 1/22] fsverity: report validation errors back to the filesystem Darrick J. Wong
2026-01-12 14:49 ` [PATCH v2 2/22] fsverity: expose ensure_fsverity_info() Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 22:18 ` Darrick J. Wong
2026-01-16 21:52 ` Matthew Wilcox
2026-01-12 14:50 ` [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file Andrey Albershteyn
` (19 subsequent siblings)
22 siblings, 2 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Flag to indicate to iomap that read/write is happening beyond EOF and no
isize checks/update is needed.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/iomap/buffered-io.c | 13 ++++++++-----
fs/iomap/trace.h | 3 ++-
include/linux/iomap.h | 5 +++++
3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e5c1ca440d..cc1cbf2a4c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -533,7 +533,8 @@
return 0;
/* zero post-eof blocks as the page may be mapped */
- if (iomap_block_needs_zeroing(iter, pos)) {
+ if (iomap_block_needs_zeroing(iter, pos) &&
+ !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
folio_zero_range(folio, poff, plen);
iomap_set_range_uptodate(folio, poff, plen);
} else {
@@ -1130,13 +1131,14 @@
* unlock and release the folio.
*/
old_size = iter->inode->i_size;
- if (pos + written > old_size) {
+ if (pos + written > old_size &&
+ !(iter->flags & IOMAP_F_BEYOND_EOF)) {
i_size_write(iter->inode, pos + written);
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
}
__iomap_put_folio(iter, write_ops, written, folio);
- if (old_size < pos)
+ if (old_size < pos && !(iter->flags & IOMAP_F_BEYOND_EOF))
pagecache_isize_extended(iter->inode, old_size, pos);
cond_resched();
@@ -1815,8 +1817,9 @@
trace_iomap_writeback_folio(inode, pos, folio_size(folio));
- if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
- return 0;
+ if (!(wpc->iomap.flags & IOMAP_F_BEYOND_EOF) &&
+ !iomap_writeback_handle_eof(folio, inode, &end_pos))
+ return 0;
WARN_ON_ONCE(end_pos <= pos);
if (i_blocks_per_folio(inode, folio) > 1) {
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 532787277b..f1895f7ae5 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -118,7 +118,8 @@
{ IOMAP_F_ATOMIC_BIO, "ATOMIC_BIO" }, \
{ IOMAP_F_PRIVATE, "PRIVATE" }, \
{ IOMAP_F_SIZE_CHANGED, "SIZE_CHANGED" }, \
- { IOMAP_F_STALE, "STALE" }
+ { IOMAP_F_STALE, "STALE" }, \
+ { IOMAP_F_BEYOND_EOF, "BEYOND_EOF" }
#define IOMAP_DIO_STRINGS \
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 520e967cb5..7a7e31c499 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -86,6 +86,11 @@
#define IOMAP_F_PRIVATE (1U << 12)
/*
+ * IO happens beyound inode EOF
+ */
+#define IOMAP_F_BEYOND_EOF (1U << 13)
+
+/*
* Flags set by the core iomap code during operations:
*
* IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-12 14:50 ` [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF Andrey Albershteyn
@ 2026-01-12 22:18 ` Darrick J. Wong
2026-01-12 22:31 ` Darrick J. Wong
2026-01-13 8:12 ` Christoph Hellwig
2026-01-16 21:52 ` Matthew Wilcox
1 sibling, 2 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:18 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> Flag to indicate to iomap that read/write is happening beyond EOF and no
> isize checks/update is needed.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/iomap/buffered-io.c | 13 ++++++++-----
> fs/iomap/trace.h | 3 ++-
> include/linux/iomap.h | 5 +++++
> 3 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e5c1ca440d..cc1cbf2a4c 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -533,7 +533,8 @@
(Does your diff program not set --show-c-function? That makes reviewing
harder because I have to search on the comment text to figure out which
function this is)
> return 0;
>
> /* zero post-eof blocks as the page may be mapped */
> - if (iomap_block_needs_zeroing(iter, pos)) {
> + if (iomap_block_needs_zeroing(iter, pos) &&
> + !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
Hrm. The last test in iomap_block_needs_zeroing is if pos is at or
beyond EOF, and iomap_adjust_read_range takes great pains to reduce plen
so that poff/plen never cross EOF. I think the intent of that code is
to ensure that we always zero the post-EOF part of a folio when reading
it in from disk.
For verity I can see why you don't want to zero the merkle tree blocks
beyond EOF, but I think this code can expose unwritten junk in the
post-EOF part of the EOF block on disk.
Would it be more correct to do:
static inline bool
iomap_block_needs_zeroing(
const struct iomap_iter *iter,
struct folio *folio,
loff_t pos)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
if (srcmap->type != IOMAP_MAPPED)
return true;
if (srcmap->flags & IOMAP_F_NEW);
return true;
/*
* Merkle tree exists in a separate folio beyond EOF, so
* only zero if this is the EOF folio.
*/
if (iomap->flags & IOMAP_F_BEYOND_EOF)
return folio_pos(folio) == i_size_read(iter->inode);
return pos >= i_size_read(iter->inode);
}
> folio_zero_range(folio, poff, plen);
> iomap_set_range_uptodate(folio, poff, plen);
> } else {
> @@ -1130,13 +1131,14 @@
> * unlock and release the folio.
> */
> old_size = iter->inode->i_size;
> - if (pos + written > old_size) {
> + if (pos + written > old_size &&
> + !(iter->flags & IOMAP_F_BEYOND_EOF)) {
> i_size_write(iter->inode, pos + written);
> iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
> }
> __iomap_put_folio(iter, write_ops, written, folio);
>
> - if (old_size < pos)
> + if (old_size < pos && !(iter->flags & IOMAP_F_BEYOND_EOF))
> pagecache_isize_extended(iter->inode, old_size, pos);
>
> cond_resched();
> @@ -1815,8 +1817,9 @@
>
> trace_iomap_writeback_folio(inode, pos, folio_size(folio));
>
> - if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
> - return 0;
> + if (!(wpc->iomap.flags & IOMAP_F_BEYOND_EOF) &&
> + !iomap_writeback_handle_eof(folio, inode, &end_pos))
Hrm. I /think/ this might break post-eof zeroing on writeback if
BEYOND_EOF is set. For verity this isn't a problem because there's no
writeback, but it's a bit of a logic bomb if someone ever tries to set
BEYOND_EOF on a non-verity file.
--D
> + return 0;
> WARN_ON_ONCE(end_pos <= pos);
>
> if (i_blocks_per_folio(inode, folio) > 1) {
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 532787277b..f1895f7ae5 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -118,7 +118,8 @@
> { IOMAP_F_ATOMIC_BIO, "ATOMIC_BIO" }, \
> { IOMAP_F_PRIVATE, "PRIVATE" }, \
> { IOMAP_F_SIZE_CHANGED, "SIZE_CHANGED" }, \
> - { IOMAP_F_STALE, "STALE" }
> + { IOMAP_F_STALE, "STALE" }, \
> + { IOMAP_F_BEYOND_EOF, "BEYOND_EOF" }
>
>
> #define IOMAP_DIO_STRINGS \
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 520e967cb5..7a7e31c499 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -86,6 +86,11 @@
> #define IOMAP_F_PRIVATE (1U << 12)
>
> /*
> + * IO happens beyound inode EOF
s/beyound/beyond/
> + */
> +#define IOMAP_F_BEYOND_EOF (1U << 13)
> +
> +/*
> * Flags set by the core iomap code during operations:
> *
> * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-12 22:18 ` Darrick J. Wong
@ 2026-01-12 22:31 ` Darrick J. Wong
2026-01-13 10:39 ` Andrey Albershteyn
2026-01-13 8:12 ` Christoph Hellwig
1 sibling, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:31 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 02:18:53PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> > Flag to indicate to iomap that read/write is happening beyond EOF and no
> > isize checks/update is needed.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/iomap/buffered-io.c | 13 ++++++++-----
> > fs/iomap/trace.h | 3 ++-
> > include/linux/iomap.h | 5 +++++
> > 3 files changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index e5c1ca440d..cc1cbf2a4c 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -533,7 +533,8 @@
>
> (Does your diff program not set --show-c-function? That makes reviewing
> harder because I have to search on the comment text to figure out which
> function this is)
>
> > return 0;
> >
> > /* zero post-eof blocks as the page may be mapped */
> > - if (iomap_block_needs_zeroing(iter, pos)) {
> > + if (iomap_block_needs_zeroing(iter, pos) &&
> > + !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
>
> Hrm. The last test in iomap_block_needs_zeroing is if pos is at or
> beyond EOF, and iomap_adjust_read_range takes great pains to reduce plen
> so that poff/plen never cross EOF. I think the intent of that code is
> to ensure that we always zero the post-EOF part of a folio when reading
> it in from disk.
>
> For verity I can see why you don't want to zero the merkle tree blocks
> beyond EOF, but I think this code can expose unwritten junk in the
> post-EOF part of the EOF block on disk.
Oh wait, is IOMAP_F_BEYOND_EOF only set on mappings that are entirely
beyond EOF, aka the merkle tree extents?
--D
> Would it be more correct to do:
>
> static inline bool
> iomap_block_needs_zeroing(
> const struct iomap_iter *iter,
> struct folio *folio,
> loff_t pos)
> {
> const struct iomap *srcmap = iomap_iter_srcmap(iter);
>
> if (srcmap->type != IOMAP_MAPPED)
> return true;
> if (srcmap->flags & IOMAP_F_NEW);
> return true;
>
> /*
> * Merkle tree exists in a separate folio beyond EOF, so
> * only zero if this is the EOF folio.
> */
> if (iomap->flags & IOMAP_F_BEYOND_EOF)
> return folio_pos(folio) == i_size_read(iter->inode);
>
> return pos >= i_size_read(iter->inode);
> }
>
> > folio_zero_range(folio, poff, plen);
> > iomap_set_range_uptodate(folio, poff, plen);
> > } else {
> > @@ -1130,13 +1131,14 @@
> > * unlock and release the folio.
> > */
> > old_size = iter->inode->i_size;
> > - if (pos + written > old_size) {
> > + if (pos + written > old_size &&
> > + !(iter->flags & IOMAP_F_BEYOND_EOF)) {
> > i_size_write(iter->inode, pos + written);
> > iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
> > }
> > __iomap_put_folio(iter, write_ops, written, folio);
> >
> > - if (old_size < pos)
> > + if (old_size < pos && !(iter->flags & IOMAP_F_BEYOND_EOF))
> > pagecache_isize_extended(iter->inode, old_size, pos);
> >
> > cond_resched();
> > @@ -1815,8 +1817,9 @@
> >
> > trace_iomap_writeback_folio(inode, pos, folio_size(folio));
> >
> > - if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
> > - return 0;
> > + if (!(wpc->iomap.flags & IOMAP_F_BEYOND_EOF) &&
> > + !iomap_writeback_handle_eof(folio, inode, &end_pos))
>
> Hrm. I /think/ this might break post-eof zeroing on writeback if
> BEYOND_EOF is set. For verity this isn't a problem because there's no
> writeback, but it's a bit of a logic bomb if someone ever tries to set
> BEYOND_EOF on a non-verity file.
>
> --D
>
> > + return 0;
> > WARN_ON_ONCE(end_pos <= pos);
> >
> > if (i_blocks_per_folio(inode, folio) > 1) {
> > diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> > index 532787277b..f1895f7ae5 100644
> > --- a/fs/iomap/trace.h
> > +++ b/fs/iomap/trace.h
> > @@ -118,7 +118,8 @@
> > { IOMAP_F_ATOMIC_BIO, "ATOMIC_BIO" }, \
> > { IOMAP_F_PRIVATE, "PRIVATE" }, \
> > { IOMAP_F_SIZE_CHANGED, "SIZE_CHANGED" }, \
> > - { IOMAP_F_STALE, "STALE" }
> > + { IOMAP_F_STALE, "STALE" }, \
> > + { IOMAP_F_BEYOND_EOF, "BEYOND_EOF" }
> >
> >
> > #define IOMAP_DIO_STRINGS \
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 520e967cb5..7a7e31c499 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -86,6 +86,11 @@
> > #define IOMAP_F_PRIVATE (1U << 12)
> >
> > /*
> > + * IO happens beyound inode EOF
>
> s/beyound/beyond/
>
> > + */
> > +#define IOMAP_F_BEYOND_EOF (1U << 13)
> > +
> > +/*
> > * Flags set by the core iomap code during operations:
> > *
> > * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
> >
> > --
> > - Andrey
> >
> >
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-12 22:31 ` Darrick J. Wong
@ 2026-01-13 10:39 ` Andrey Albershteyn
0 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 10:39 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On 2026-01-12 14:31:58, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 02:18:53PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> > > Flag to indicate to iomap that read/write is happening beyond EOF and no
> > > isize checks/update is needed.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/iomap/buffered-io.c | 13 ++++++++-----
> > > fs/iomap/trace.h | 3 ++-
> > > include/linux/iomap.h | 5 +++++
> > > 3 files changed, 15 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index e5c1ca440d..cc1cbf2a4c 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -533,7 +533,8 @@
> >
> > (Does your diff program not set --show-c-function? That makes reviewing
> > harder because I have to search on the comment text to figure out which
> > function this is)
> >
> > > return 0;
> > >
> > > /* zero post-eof blocks as the page may be mapped */
> > > - if (iomap_block_needs_zeroing(iter, pos)) {
> > > + if (iomap_block_needs_zeroing(iter, pos) &&
> > > + !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> >
> > Hrm. The last test in iomap_block_needs_zeroing is if pos is at or
> > beyond EOF, and iomap_adjust_read_range takes great pains to reduce plen
> > so that poff/plen never cross EOF. I think the intent of that code is
> > to ensure that we always zero the post-EOF part of a folio when reading
> > it in from disk.
> >
> > For verity I can see why you don't want to zero the merkle tree blocks
> > beyond EOF, but I think this code can expose unwritten junk in the
> > post-EOF part of the EOF block on disk.
>
> Oh wait, is IOMAP_F_BEYOND_EOF only set on mappings that are entirely
> beyond EOF, aka the merkle tree extents?
yes, and only on fsverity files, so no normal read can get
BEYOND_EOF
>
> --D
>
> > Would it be more correct to do:
> >
> > static inline bool
> > iomap_block_needs_zeroing(
> > const struct iomap_iter *iter,
> > struct folio *folio,
> > loff_t pos)
> > {
> > const struct iomap *srcmap = iomap_iter_srcmap(iter);
> >
> > if (srcmap->type != IOMAP_MAPPED)
> > return true;
> > if (srcmap->flags & IOMAP_F_NEW);
> > return true;
> >
> > /*
> > * Merkle tree exists in a separate folio beyond EOF, so
> > * only zero if this is the EOF folio.
> > */
> > if (iomap->flags & IOMAP_F_BEYOND_EOF)
> > return folio_pos(folio) == i_size_read(iter->inode);
> >
> > return pos >= i_size_read(iter->inode);
> > }
> >
> > > folio_zero_range(folio, poff, plen);
> > > iomap_set_range_uptodate(folio, poff, plen);
> > > } else {
> > > @@ -1130,13 +1131,14 @@
> > > * unlock and release the folio.
> > > */
> > > old_size = iter->inode->i_size;
> > > - if (pos + written > old_size) {
> > > + if (pos + written > old_size &&
> > > + !(iter->flags & IOMAP_F_BEYOND_EOF)) {
> > > i_size_write(iter->inode, pos + written);
> > > iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
> > > }
> > > __iomap_put_folio(iter, write_ops, written, folio);
> > >
> > > - if (old_size < pos)
> > > + if (old_size < pos && !(iter->flags & IOMAP_F_BEYOND_EOF))
> > > pagecache_isize_extended(iter->inode, old_size, pos);
> > >
> > > cond_resched();
> > > @@ -1815,8 +1817,9 @@
> > >
> > > trace_iomap_writeback_folio(inode, pos, folio_size(folio));
> > >
> > > - if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
> > > - return 0;
> > > + if (!(wpc->iomap.flags & IOMAP_F_BEYOND_EOF) &&
> > > + !iomap_writeback_handle_eof(folio, inode, &end_pos))
> >
> > Hrm. I /think/ this might break post-eof zeroing on writeback if
> > BEYOND_EOF is set. For verity this isn't a problem because there's no
> > writeback, but it's a bit of a logic bomb if someone ever tries to set
> > BEYOND_EOF on a non-verity file.
I will rename it to IOMAP_F_FSVERITY
> >
> > --D
> >
> > > + return 0;
> > > WARN_ON_ONCE(end_pos <= pos);
> > >
> > > if (i_blocks_per_folio(inode, folio) > 1) {
> > > diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> > > index 532787277b..f1895f7ae5 100644
> > > --- a/fs/iomap/trace.h
> > > +++ b/fs/iomap/trace.h
> > > @@ -118,7 +118,8 @@
> > > { IOMAP_F_ATOMIC_BIO, "ATOMIC_BIO" }, \
> > > { IOMAP_F_PRIVATE, "PRIVATE" }, \
> > > { IOMAP_F_SIZE_CHANGED, "SIZE_CHANGED" }, \
> > > - { IOMAP_F_STALE, "STALE" }
> > > + { IOMAP_F_STALE, "STALE" }, \
> > > + { IOMAP_F_BEYOND_EOF, "BEYOND_EOF" }
> > >
> > >
> > > #define IOMAP_DIO_STRINGS \
> > > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > > index 520e967cb5..7a7e31c499 100644
> > > --- a/include/linux/iomap.h
> > > +++ b/include/linux/iomap.h
> > > @@ -86,6 +86,11 @@
> > > #define IOMAP_F_PRIVATE (1U << 12)
> > >
> > > /*
> > > + * IO happens beyound inode EOF
> >
> > s/beyound/beyond/
will fix
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-12 22:18 ` Darrick J. Wong
2026-01-12 22:31 ` Darrick J. Wong
@ 2026-01-13 8:12 ` Christoph Hellwig
2026-01-13 10:50 ` Andrey Albershteyn
1 sibling, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:12 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Mon, Jan 12, 2026 at 02:18:53PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> > Flag to indicate to iomap that read/write is happening beyond EOF and no
> > isize checks/update is needed.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/iomap/buffered-io.c | 13 ++++++++-----
> > fs/iomap/trace.h | 3 ++-
> > include/linux/iomap.h | 5 +++++
> > 3 files changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index e5c1ca440d..cc1cbf2a4c 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -533,7 +533,8 @@
>
> (Does your diff program not set --show-c-function? That makes reviewing
> harder because I have to search on the comment text to figure out which
> function this is)
Or use git-send-email which sidesteps all these issues :)
> Hrm. The last test in iomap_block_needs_zeroing is if pos is at or
> beyond EOF, and iomap_adjust_read_range takes great pains to reduce plen
> so that poff/plen never cross EOF. I think the intent of that code is
> to ensure that we always zero the post-EOF part of a folio when reading
> it in from disk.
>
> For verity I can see why you don't want to zero the merkle tree blocks
> beyond EOF, but I think this code can expose unwritten junk in the
> post-EOF part of the EOF block on disk.
>
> Would it be more correct to do:
Or replace the generic past EOF flag with a FSVERITY flag making
the use case clear?
> > @@ -1815,8 +1817,9 @@
> >
> > trace_iomap_writeback_folio(inode, pos, folio_size(folio));
> >
> > - if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
> > - return 0;
> > + if (!(wpc->iomap.flags & IOMAP_F_BEYOND_EOF) &&
> > + !iomap_writeback_handle_eof(folio, inode, &end_pos))
>
> Hrm. I /think/ this might break post-eof zeroing on writeback if
> BEYOND_EOF is set. For verity this isn't a problem because there's no
> writeback, but it's a bit of a logic bomb if someone ever tries to set
> BEYOND_EOF on a non-verity file.
Maybe we should not even support the flag for writeback?
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-13 8:12 ` Christoph Hellwig
@ 2026-01-13 10:50 ` Andrey Albershteyn
2026-01-13 16:22 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 10:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On 2026-01-13 09:12:20, Christoph Hellwig wrote:
> On Mon, Jan 12, 2026 at 02:18:53PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> > > Flag to indicate to iomap that read/write is happening beyond EOF and no
> > > isize checks/update is needed.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/iomap/buffered-io.c | 13 ++++++++-----
> > > fs/iomap/trace.h | 3 ++-
> > > include/linux/iomap.h | 5 +++++
> > > 3 files changed, 15 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index e5c1ca440d..cc1cbf2a4c 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -533,7 +533,8 @@
> >
> > (Does your diff program not set --show-c-function? That makes reviewing
> > harder because I have to search on the comment text to figure out which
> > function this is)
>
> Or use git-send-email which sidesteps all these issues :)
>
> > Hrm. The last test in iomap_block_needs_zeroing is if pos is at or
> > beyond EOF, and iomap_adjust_read_range takes great pains to reduce plen
> > so that poff/plen never cross EOF. I think the intent of that code is
> > to ensure that we always zero the post-EOF part of a folio when reading
> > it in from disk.
> >
> > For verity I can see why you don't want to zero the merkle tree blocks
> > beyond EOF, but I think this code can expose unwritten junk in the
> > post-EOF part of the EOF block on disk.
> >
> > Would it be more correct to do:
>
> Or replace the generic past EOF flag with a FSVERITY flag making
> the use case clear?
>
I will rename it to _FSVERITY
> > > @@ -1815,8 +1817,9 @@
> > >
> > > trace_iomap_writeback_folio(inode, pos, folio_size(folio));
> > >
> > > - if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
> > > - return 0;
> > > + if (!(wpc->iomap.flags & IOMAP_F_BEYOND_EOF) &&
> > > + !iomap_writeback_handle_eof(folio, inode, &end_pos))
> >
> > Hrm. I /think/ this might break post-eof zeroing on writeback if
> > BEYOND_EOF is set. For verity this isn't a problem because there's no
> > writeback, but it's a bit of a logic bomb if someone ever tries to set
> > BEYOND_EOF on a non-verity file.
>
> Maybe we should not even support the flag for writeback?
>
Hmm not sure how you see this, the verity pages need to get written
somehow and as they are beyond EOF they will be zeroed out here.
Regarding your comment in the thread to use direct IO, fsverity
wants to use page cache as cache for verified pages (already
verified page will have "verified" flag set, freshly read won't).
So, I think writeback has to know how to handle them
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-13 10:50 ` Andrey Albershteyn
@ 2026-01-13 16:22 ` Christoph Hellwig
2026-01-13 17:57 ` Darrick J. Wong
0 siblings, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 16:22 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, Darrick J. Wong, fsverity, linux-xfs, ebiggers,
linux-fsdevel, aalbersh, david
On Tue, Jan 13, 2026 at 11:50:07AM +0100, Andrey Albershteyn wrote:
>
> Hmm not sure how you see this, the verity pages need to get written
> somehow and as they are beyond EOF they will be zeroed out here.
>
> Regarding your comment in the thread to use direct IO, fsverity
> wants to use page cache as cache for verified pages (already
> verified page will have "verified" flag set, freshly read won't).
Ok, I guess we need to stick to the pagecache based path then.
The upside is less differences from the existing file systems.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-13 16:22 ` Christoph Hellwig
@ 2026-01-13 17:57 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-13 17:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On Tue, Jan 13, 2026 at 05:22:02PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 11:50:07AM +0100, Andrey Albershteyn wrote:
> >
> > Hmm not sure how you see this, the verity pages need to get written
> > somehow and as they are beyond EOF they will be zeroed out here.
> >
> > Regarding your comment in the thread to use direct IO, fsverity
> > wants to use page cache as cache for verified pages (already
> > verified page will have "verified" flag set, freshly read won't).
>
> Ok, I guess we need to stick to the pagecache based path then.
> The upside is less differences from the existing file systems.
<nod> IIRC fsverity also doesn't allow directio, so it seems more
consistent to me that newly created fsverity artifacts are written
through the pagecache IO paths.
--D
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-12 14:50 ` [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF Andrey Albershteyn
2026-01-12 22:18 ` Darrick J. Wong
@ 2026-01-16 21:52 ` Matthew Wilcox
2026-01-17 2:11 ` Darrick J. Wong
1 sibling, 1 reply; 86+ messages in thread
From: Matthew Wilcox @ 2026-01-16 21:52 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david, hch
On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> Flag to indicate to iomap that read/write is happening beyond EOF and no
> isize checks/update is needed.
Darrick and I talked to Ted about this yesterday, and we think you don't
need the writeback portions of this. (As Darrick said, it's hard to
tell if this is all writeback or just some of it).
The flow for creating a verity file should be:
write data to pagecache
calculate merkle tree data, write it to pagecache
write_and_wait the entire file
do verity magic
If you follow this flow, there's never a need to do writeback past EOF.
Or indeed writeback on a verity file at all, since it's now immutable.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF
2026-01-16 21:52 ` Matthew Wilcox
@ 2026-01-17 2:11 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-17 2:11 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Fri, Jan 16, 2026 at 09:52:07PM +0000, Matthew Wilcox wrote:
> On Mon, Jan 12, 2026 at 03:50:05PM +0100, Andrey Albershteyn wrote:
> > Flag to indicate to iomap that read/write is happening beyond EOF and no
> > isize checks/update is needed.
>
> Darrick and I talked to Ted about this yesterday, and we think you don't
> need the writeback portions of this. (As Darrick said, it's hard to
> tell if this is all writeback or just some of it).
>
> The flow for creating a verity file should be:
>
> write data to pagecache
> calculate merkle tree data, write it to pagecache
> write_and_wait the entire file
> do verity magic
>
> If you follow this flow, there's never a need to do writeback past EOF.
> Or indeed writeback on a verity file at all, since it's now immutable.
Hrmm, I came to rather opposite conclusions -- if you're going to write
the fsverity metadata via the pagecache then you *do* have to fix iomap
writeback to deal with post-EOF data correctly. Right now it just
ignores everything beyond isize, and I think the xfs side of things will
still do things like try to update the ondisk size in the ioend.
I think hch or someone said that maybe those writes could be directio,
though using the pagecache for verity setup means it's all incore and
ready to go when the ioctl returns.
OTOH it would be convenient to preallocate unwritten extents for the
merkle tree, write it with directio, and now if isize ever gets
corrupted such that userspace actually /can/ read the merkle tree data,
they'll at least see zeroes. Though I think there's an ioctl that lets
you do that anyway.
--D
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (2 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 22:22 ` Darrick J. Wong
2026-01-12 14:50 ` [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
` (18 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
This will be necessary for XFS to use iomap_file_buffered_write() in
context without file pointer.
As the only user of this is XFS fsverity let's set necessary
IOMAP_F_BEYOND_EOF flag if no file provided instead of adding new flags
to iocb->ki_flags.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/iomap/buffered-io.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index cc1cbf2a4c..79d1c97f02 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1173,7 +1173,6 @@
const struct iomap_write_ops *write_ops, void *private)
{
struct iomap_iter iter = {
- .inode = iocb->ki_filp->f_mapping->host,
.pos = iocb->ki_pos,
.len = iov_iter_count(i),
.flags = IOMAP_WRITE,
@@ -1181,6 +1180,13 @@
};
ssize_t ret;
+ if (iocb->ki_filp) {
+ iter.inode = iocb->ki_filp->f_mapping->host;
+ } else {
+ iter.inode = (struct inode *)private;
+ iter.flags |= IOMAP_F_BEYOND_EOF;
+ }
+
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
if (iocb->ki_flags & IOCB_DONTCACHE)
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-12 14:50 ` [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file Andrey Albershteyn
@ 2026-01-12 22:22 ` Darrick J. Wong
2026-01-13 8:15 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:22 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:50:18PM +0100, Andrey Albershteyn wrote:
> This will be necessary for XFS to use iomap_file_buffered_write() in
> context without file pointer.
>
> As the only user of this is XFS fsverity let's set necessary
> IOMAP_F_BEYOND_EOF flag if no file provided instead of adding new flags
> to iocb->ki_flags.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/iomap/buffered-io.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index cc1cbf2a4c..79d1c97f02 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1173,7 +1173,6 @@
> const struct iomap_write_ops *write_ops, void *private)
> {
> struct iomap_iter iter = {
> - .inode = iocb->ki_filp->f_mapping->host,
> .pos = iocb->ki_pos,
> .len = iov_iter_count(i),
> .flags = IOMAP_WRITE,
> @@ -1181,6 +1180,13 @@
> };
> ssize_t ret;
>
> + if (iocb->ki_filp) {
> + iter.inode = iocb->ki_filp->f_mapping->host;
> + } else {
> + iter.inode = (struct inode *)private;
@private is for the filesystem implementation to access, not the generic
iomap code. If this is intended for fsverity, then shouldn't merkle
tree construction be the only time that fsverity writes to the file?
And shouldn't fsverity therefore have access to the struct file?
> + iter.flags |= IOMAP_F_BEYOND_EOF;
IOMAP_F_ flags are mapping state flags for struct iomap::flags, not the
iomap_iter.
--D
> + }
> +
> if (iocb->ki_flags & IOCB_NOWAIT)
> iter.flags |= IOMAP_NOWAIT;
> if (iocb->ki_flags & IOCB_DONTCACHE)
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-12 22:22 ` Darrick J. Wong
@ 2026-01-13 8:15 ` Christoph Hellwig
2026-01-13 10:53 ` Andrey Albershteyn
2026-01-13 16:43 ` Matthew Wilcox
0 siblings, 2 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:15 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Mon, Jan 12, 2026 at 02:22:15PM -0800, Darrick J. Wong wrote:
> > + iter.inode = iocb->ki_filp->f_mapping->host;
> > + } else {
> > + iter.inode = (struct inode *)private;
>
> @private is for the filesystem implementation to access, not the generic
> iomap code. If this is intended for fsverity, then shouldn't merkle
> tree construction be the only time that fsverity writes to the file?
> And shouldn't fsverity therefore have access to the struct file?
It's not passed down, but I think it could easily.
>
> > + iter.flags |= IOMAP_F_BEYOND_EOF;
>
> IOMAP_F_ flags are mapping state flags for struct iomap::flags, not the
> iomap_iter.
But we could fix this part as well by having a specific helper for
fsverity that is called instead of iomap_file_buffered_write.
Neither the iocb, nor struct file are used inside of iomap_write_iter.
So just add a new helper that takes an inode, sets the past-EOF/verify
flag and open codes the iomap_iter/iomap_write_iter loop.
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-13 8:15 ` Christoph Hellwig
@ 2026-01-13 10:53 ` Andrey Albershteyn
2026-01-13 16:43 ` Matthew Wilcox
1 sibling, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 10:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On 2026-01-13 09:15:35, Christoph Hellwig wrote:
> On Mon, Jan 12, 2026 at 02:22:15PM -0800, Darrick J. Wong wrote:
> > > + iter.inode = iocb->ki_filp->f_mapping->host;
> > > + } else {
> > > + iter.inode = (struct inode *)private;
> >
> > @private is for the filesystem implementation to access, not the generic
> > iomap code. If this is intended for fsverity, then shouldn't merkle
> > tree construction be the only time that fsverity writes to the file?
> > And shouldn't fsverity therefore have access to the struct file?
>
> It's not passed down, but I think it could easily.
>
> >
> > > + iter.flags |= IOMAP_F_BEYOND_EOF;
> >
> > IOMAP_F_ flags are mapping state flags for struct iomap::flags, not the
> > iomap_iter.
>
> But we could fix this part as well by having a specific helper for
> fsverity that is called instead of iomap_file_buffered_write.
> Neither the iocb, nor struct file are used inside of iomap_write_iter.
> So just add a new helper that takes an inode, sets the past-EOF/verify
> flag and open codes the iomap_iter/iomap_write_iter loop.
>
sure, sounds good
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-13 8:15 ` Christoph Hellwig
2026-01-13 10:53 ` Andrey Albershteyn
@ 2026-01-13 16:43 ` Matthew Wilcox
2026-01-14 4:49 ` Matthew Wilcox
2026-01-14 6:41 ` Christoph Hellwig
1 sibling, 2 replies; 86+ messages in thread
From: Matthew Wilcox @ 2026-01-13 16:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, Andrey Albershteyn, fsverity, linux-xfs,
ebiggers, linux-fsdevel, aalbersh, david
On Tue, Jan 13, 2026 at 09:15:35AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 12, 2026 at 02:22:15PM -0800, Darrick J. Wong wrote:
> > > + iter.inode = iocb->ki_filp->f_mapping->host;
> > > + } else {
> > > + iter.inode = (struct inode *)private;
> >
> > @private is for the filesystem implementation to access, not the generic
> > iomap code. If this is intended for fsverity, then shouldn't merkle
> > tree construction be the only time that fsverity writes to the file?
> > And shouldn't fsverity therefore have access to the struct file?
>
> It's not passed down, but I think it could easily.
willy@deadly:~/kernel/linux$ git grep ki_filp |grep file_inode | wc
109 575 7766
willy@deadly:~/kernel/linux$ git grep ki_filp |wc
367 1920 23371
I think there's a pretty strong argument for adding ki_inode to
struct kiocb. What do you think?
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-13 16:43 ` Matthew Wilcox
@ 2026-01-14 4:49 ` Matthew Wilcox
2026-01-14 6:41 ` Christoph Hellwig
1 sibling, 0 replies; 86+ messages in thread
From: Matthew Wilcox @ 2026-01-14 4:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, Andrey Albershteyn, fsverity, linux-xfs,
ebiggers, linux-fsdevel, aalbersh, david
On Tue, Jan 13, 2026 at 04:43:48PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 13, 2026 at 09:15:35AM +0100, Christoph Hellwig wrote:
> > On Mon, Jan 12, 2026 at 02:22:15PM -0800, Darrick J. Wong wrote:
> > > > + iter.inode = iocb->ki_filp->f_mapping->host;
> > > > + } else {
> > > > + iter.inode = (struct inode *)private;
> > >
> > > @private is for the filesystem implementation to access, not the generic
> > > iomap code. If this is intended for fsverity, then shouldn't merkle
> > > tree construction be the only time that fsverity writes to the file?
> > > And shouldn't fsverity therefore have access to the struct file?
> >
> > It's not passed down, but I think it could easily.
>
> willy@deadly:~/kernel/linux$ git grep ki_filp |grep file_inode | wc
> 109 575 7766
> willy@deadly:~/kernel/linux$ git grep ki_filp |wc
> 367 1920 23371
>
> I think there's a pretty strong argument for adding ki_inode to
> struct kiocb. What do you think?
Regardless of this, I think it's an architectural mistake to not have
struct file available for reading fsverity metadata. We do have it in
fsverity_ioctl_read_metadata but we don't have it in verify_bh().
But reading the merkel tree block from the verify_bh() completion handler
is bad for performance anyway. We should submit the metadata reads at
the same time we submit the data reads. And there, we would have the
struct file.
If we ever have ambitions to support merkel trees on network filesystems,
we'll need to figure out some way to pass the struct file in, so maybe
we should just bite the bullet and do this now?
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-13 16:43 ` Matthew Wilcox
2026-01-14 4:49 ` Matthew Wilcox
@ 2026-01-14 6:41 ` Christoph Hellwig
2026-01-14 16:43 ` Darrick J. Wong
1 sibling, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-14 6:41 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Darrick J. Wong, Andrey Albershteyn, fsverity,
linux-xfs, ebiggers, linux-fsdevel, aalbersh, david
On Tue, Jan 13, 2026 at 04:43:48PM +0000, Matthew Wilcox wrote:
> > It's not passed down, but I think it could easily.
>
> willy@deadly:~/kernel/linux$ git grep ki_filp |grep file_inode | wc
> 109 575 7766
> willy@deadly:~/kernel/linux$ git grep ki_filp |wc
> 367 1920 23371
>
> I think there's a pretty strong argument for adding ki_inode to
> struct kiocb. What do you think?
That assumes the reduced pointer dereference is worth bloating the
structure. Feel free to give it a try.
Note that we'd still require the file to be set, otherwise we're
going to have calling conventions from hell.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file
2026-01-14 6:41 ` Christoph Hellwig
@ 2026-01-14 16:43 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-14 16:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Matthew Wilcox, Andrey Albershteyn, fsverity, linux-xfs, ebiggers,
linux-fsdevel, aalbersh, david
On Wed, Jan 14, 2026 at 07:41:30AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 04:43:48PM +0000, Matthew Wilcox wrote:
> > > It's not passed down, but I think it could easily.
> >
> > willy@deadly:~/kernel/linux$ git grep ki_filp |grep file_inode | wc
> > 109 575 7766
> > willy@deadly:~/kernel/linux$ git grep ki_filp |wc
> > 367 1920 23371
> >
> > I think there's a pretty strong argument for adding ki_inode to
> > struct kiocb. What do you think?
>
> That assumes the reduced pointer dereference is worth bloating the
> structure. Feel free to give it a try.
>
> Note that we'd still require the file to be set, otherwise we're
> going to have calling conventions from hell.
I think it makes more sense to make fsverity take the file and pass it
around to the filesystems. After all, fsverity metadata is just more
file IO, right? :D
--D
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (3 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 22:35 ` Darrick J. Wong
2026-01-13 8:19 ` Christoph Hellwig
2026-01-12 14:50 ` [PATCH v2 6/22] xfs: add fs-verity ro-compat flag Andrey Albershteyn
` (17 subsequent siblings)
22 siblings, 2 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
This patch adds fs-verity verification into iomap's read path. After
BIO's io operation is complete the data are verified against
fs-verity's Merkle tree. Verification work is done in a separate
workqueue.
The read path ioend iomap_read_ioend are stored side by side with
BIOs if FS_VERITY is enabled.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/iomap/bio.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++----
fs/iomap/buffered-io.c | 12 ++++++++-
fs/iomap/ioend.c | 41 +++++++++++++++++++++++++++++++-
include/linux/iomap.h | 11 ++++++++
4 files changed, 123 insertions(+), 7 deletions(-)
diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
index fc045f2e4c..ac6c16b1f8 100644
--- a/fs/iomap/bio.c
+++ b/fs/iomap/bio.c
@@ -5,6 +5,7 @@
*/
#include <linux/iomap.h>
#include <linux/pagemap.h>
+#include <linux/fsverity.h>
#include "internal.h"
#include "trace.h"
@@ -18,6 +19,60 @@
bio_put(bio);
}
+#ifdef CONFIG_FS_VERITY
+static void
+iomap_read_fsverify_end_io_work(struct work_struct *work)
+{
+ struct iomap_fsverity_bio *fbio =
+ container_of(work, struct iomap_fsverity_bio, work);
+
+ fsverity_verify_bio(&fbio->bio);
+ iomap_read_end_io(&fbio->bio);
+}
+
+static void
+iomap_read_fsverity_end_io(struct bio *bio)
+{
+ struct iomap_fsverity_bio *fbio =
+ container_of(bio, struct iomap_fsverity_bio, bio);
+
+ INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
+ fsverity_enqueue_verify_work(&fbio->work);
+}
+
+static struct bio *
+iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
+ int nr_vecs, gfp_t gfp)
+{
+ struct bio *bio;
+
+ bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
+ iomap_fsverity_bioset);
+ if (bio)
+ bio->bi_end_io = iomap_read_fsverity_end_io;
+ return bio;
+}
+
+#else
+# define iomap_fsverity_read_bio_alloc(...) (NULL)
+# define iomap_fsverity_tree_end_align(...) (false)
+#endif /* CONFIG_FS_VERITY */
+
+static struct bio *iomap_read_bio_alloc(struct inode *inode,
+ const struct iomap *iomap, int nr_vecs, gfp_t gfp)
+{
+ struct bio *bio;
+ struct block_device *bdev = iomap->bdev;
+
+ if (!(iomap->flags & IOMAP_F_BEYOND_EOF) && fsverity_active(inode))
+ return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
+
+ bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
+ if (bio)
+ bio->bi_end_io = iomap_read_end_io;
+ return bio;
+}
+
static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
{
struct bio *bio = ctx->read_ctx;
@@ -42,26 +97,27 @@
!bio_add_folio(bio, folio, plen, poff)) {
gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
gfp_t orig_gfp = gfp;
- unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
if (bio)
submit_bio(bio);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
- bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
- gfp);
+ bio = iomap_read_bio_alloc(iter->inode, iomap,
+ bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
+ gfp);
+
/*
* If the bio_alloc fails, try it again for a single page to
* avoid having to deal with partial page reads. This emulates
* what do_mpage_read_folio does.
*/
if (!bio)
- bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
+ bio = iomap_read_bio_alloc(iter->inode, iomap, 1,
+ orig_gfp);
if (ctx->rac)
bio->bi_opf |= REQ_RAHEAD;
bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = iomap_read_end_io;
bio_add_folio_nofail(bio, folio, plen, poff);
ctx->read_ctx = bio;
}
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 79d1c97f02..481f7e1cff 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -8,6 +8,7 @@
#include <linux/writeback.h>
#include <linux/swap.h>
#include <linux/migrate.h>
+#include <linux/fsverity.h>
#include "internal.h"
#include "trace.h"
@@ -532,10 +533,19 @@
if (plen == 0)
return 0;
+ /* end of fs-verity region*/
+ if ((iomap->flags & IOMAP_F_BEYOND_EOF) && (iomap->type == IOMAP_HOLE)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ }
/* zero post-eof blocks as the page may be mapped */
- if (iomap_block_needs_zeroing(iter, pos) &&
+ else if (iomap_block_needs_zeroing(iter, pos) &&
!(iomap->flags & IOMAP_F_BEYOND_EOF)) {
folio_zero_range(folio, poff, plen);
+ if (fsverity_active(iter->inode) &&
+ !fsverity_verify_blocks(folio, plen, poff)) {
+ return -EIO;
+ }
iomap_set_range_uptodate(folio, poff, plen);
} else {
if (!*bytes_submitted)
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index 86f44922ed..30c0de3c75 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -9,6 +9,8 @@
#include "internal.h"
#include "trace.h"
+#define IOMAP_POOL_SIZE (4 * (PAGE_SIZE / SECTOR_SIZE))
+
struct bio_set iomap_ioend_bioset;
EXPORT_SYMBOL_GPL(iomap_ioend_bioset);
@@ -423,9 +425,46 @@
}
EXPORT_SYMBOL_GPL(iomap_split_ioend);
+#ifdef CONFIG_FS_VERITY
+struct bio_set *iomap_fsverity_bioset;
+EXPORT_SYMBOL_GPL(iomap_fsverity_bioset);
+int iomap_fsverity_init_bioset(void)
+{
+ struct bio_set *bs, *old;
+ int error;
+
+ bs = kzalloc(sizeof(*bs), GFP_KERNEL);
+ if (!bs)
+ return -ENOMEM;
+
+ error = bioset_init(bs, IOMAP_POOL_SIZE,
+ offsetof(struct iomap_fsverity_bio, bio),
+ BIOSET_NEED_BVECS);
+ if (error) {
+ kfree(bs);
+ return error;
+ }
+
+ /*
+ * This has to be atomic as readaheads can race to create the
+ * bioset. If someone set the pointer before us, we drop ours.
+ */
+ old = cmpxchg(&iomap_fsverity_bioset, NULL, bs);
+ if (old) {
+ bioset_exit(bs);
+ kfree(bs);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(iomap_fsverity_init_bioset);
+#else
+# define iomap_fsverity_init_bioset(...) (-EOPNOTSUPP)
+#endif
+
static int __init iomap_ioend_init(void)
{
- return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
+ return bioset_init(&iomap_ioend_bioset, IOMAP_POOL_SIZE,
offsetof(struct iomap_ioend, io_bio),
BIOSET_NEED_BVECS);
}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 7a7e31c499..b451ab3426 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -342,6 +342,17 @@
iter->srcmap.type == IOMAP_MAPPED;
}
+#ifdef CONFIG_FS_VERITY
+extern struct bio_set *iomap_fsverity_bioset;
+
+struct iomap_fsverity_bio {
+ struct work_struct work;
+ struct bio bio;
+};
+
+int iomap_fsverity_init_bioset(void);
+#endif
+
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path
2026-01-12 14:50 ` [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
@ 2026-01-12 22:35 ` Darrick J. Wong
2026-01-13 11:16 ` Andrey Albershteyn
2026-01-13 8:19 ` Christoph Hellwig
1 sibling, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:35 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:50:26PM +0100, Andrey Albershteyn wrote:
> This patch adds fs-verity verification into iomap's read path. After
> BIO's io operation is complete the data are verified against
> fs-verity's Merkle tree. Verification work is done in a separate
> workqueue.
>
> The read path ioend iomap_read_ioend are stored side by side with
> BIOs if FS_VERITY is enabled.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/iomap/bio.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++----
> fs/iomap/buffered-io.c | 12 ++++++++-
> fs/iomap/ioend.c | 41 +++++++++++++++++++++++++++++++-
> include/linux/iomap.h | 11 ++++++++
> 4 files changed, 123 insertions(+), 7 deletions(-)
>
> diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
> index fc045f2e4c..ac6c16b1f8 100644
> --- a/fs/iomap/bio.c
> +++ b/fs/iomap/bio.c
> @@ -5,6 +5,7 @@
> */
> #include <linux/iomap.h>
> #include <linux/pagemap.h>
> +#include <linux/fsverity.h>
> #include "internal.h"
> #include "trace.h"
>
> @@ -18,6 +19,60 @@
> bio_put(bio);
> }
>
> +#ifdef CONFIG_FS_VERITY
Should all this stuff go into fs/iomap/fsverity.c instead of ifdef'd
around the iomap code?
<shrug>
> +static void
> +iomap_read_fsverify_end_io_work(struct work_struct *work)
> +{
> + struct iomap_fsverity_bio *fbio =
> + container_of(work, struct iomap_fsverity_bio, work);
> +
> + fsverity_verify_bio(&fbio->bio);
> + iomap_read_end_io(&fbio->bio);
> +}
> +
> +static void
> +iomap_read_fsverity_end_io(struct bio *bio)
> +{
> + struct iomap_fsverity_bio *fbio =
> + container_of(bio, struct iomap_fsverity_bio, bio);
> +
> + INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
> + fsverity_enqueue_verify_work(&fbio->work);
> +}
> +
> +static struct bio *
> +iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
> + int nr_vecs, gfp_t gfp)
> +{
> + struct bio *bio;
> +
> + bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> + iomap_fsverity_bioset);
> + if (bio)
> + bio->bi_end_io = iomap_read_fsverity_end_io;
> + return bio;
> +}
> +
> +#else
> +# define iomap_fsverity_read_bio_alloc(...) (NULL)
> +# define iomap_fsverity_tree_end_align(...) (false)
> +#endif /* CONFIG_FS_VERITY */
> +
> +static struct bio *iomap_read_bio_alloc(struct inode *inode,
> + const struct iomap *iomap, int nr_vecs, gfp_t gfp)
> +{
> + struct bio *bio;
> + struct block_device *bdev = iomap->bdev;
> +
> + if (!(iomap->flags & IOMAP_F_BEYOND_EOF) && fsverity_active(inode))
> + return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
> +
> + bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
> + if (bio)
> + bio->bi_end_io = iomap_read_end_io;
> + return bio;
> +}
> +
> static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
> {
> struct bio *bio = ctx->read_ctx;
> @@ -42,26 +97,27 @@
> !bio_add_folio(bio, folio, plen, poff)) {
> gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> gfp_t orig_gfp = gfp;
> - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
>
> if (bio)
> submit_bio(bio);
>
> if (ctx->rac) /* same as readahead_gfp_mask */
> gfp |= __GFP_NORETRY | __GFP_NOWARN;
> - bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
> - gfp);
> + bio = iomap_read_bio_alloc(iter->inode, iomap,
> + bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
> + gfp);
> +
> /*
> * If the bio_alloc fails, try it again for a single page to
> * avoid having to deal with partial page reads. This emulates
> * what do_mpage_read_folio does.
> */
> if (!bio)
> - bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> + bio = iomap_read_bio_alloc(iter->inode, iomap, 1,
> + orig_gfp);
> if (ctx->rac)
> bio->bi_opf |= REQ_RAHEAD;
> bio->bi_iter.bi_sector = sector;
> - bio->bi_end_io = iomap_read_end_io;
> bio_add_folio_nofail(bio, folio, plen, poff);
> ctx->read_ctx = bio;
> }
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 79d1c97f02..481f7e1cff 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -8,6 +8,7 @@
> #include <linux/writeback.h>
> #include <linux/swap.h>
> #include <linux/migrate.h>
> +#include <linux/fsverity.h>
> #include "internal.h"
> #include "trace.h"
>
> @@ -532,10 +533,19 @@
> if (plen == 0)
> return 0;
>
> + /* end of fs-verity region*/
> + if ((iomap->flags & IOMAP_F_BEYOND_EOF) && (iomap->type == IOMAP_HOLE)) {
Overly long line.
Also, when do we get the combination of BEYOND_EOF && HOLE? Is that for
sparse regions in only the merkle tree? IIRC (and I could be wrong)
fsverity still wants to checksum sparse holes in the regular file data,
right?
> + folio_zero_range(folio, poff, plen);
> + iomap_set_range_uptodate(folio, poff, plen);
> + }
> /* zero post-eof blocks as the page may be mapped */
> - if (iomap_block_needs_zeroing(iter, pos) &&
> + else if (iomap_block_needs_zeroing(iter, pos) &&
} else if (...
(nitpicking indentation)
> !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> folio_zero_range(folio, poff, plen);
> + if (fsverity_active(iter->inode) &&
> + !fsverity_verify_blocks(folio, plen, poff)) {
> + return -EIO;
> + }
> iomap_set_range_uptodate(folio, poff, plen);
> } else {
> if (!*bytes_submitted)
> diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> index 86f44922ed..30c0de3c75 100644
> --- a/fs/iomap/ioend.c
> +++ b/fs/iomap/ioend.c
> @@ -9,6 +9,8 @@
> #include "internal.h"
> #include "trace.h"
>
> +#define IOMAP_POOL_SIZE (4 * (PAGE_SIZE / SECTOR_SIZE))
How do we arrive at this pool size? How is it important to have a
larger bio reserve pool for *larger* base page sizes?
--D
> +
> struct bio_set iomap_ioend_bioset;
> EXPORT_SYMBOL_GPL(iomap_ioend_bioset);
>
> @@ -423,9 +425,46 @@
> }
> EXPORT_SYMBOL_GPL(iomap_split_ioend);
>
> +#ifdef CONFIG_FS_VERITY
> +struct bio_set *iomap_fsverity_bioset;
> +EXPORT_SYMBOL_GPL(iomap_fsverity_bioset);
> +int iomap_fsverity_init_bioset(void)
> +{
> + struct bio_set *bs, *old;
> + int error;
> +
> + bs = kzalloc(sizeof(*bs), GFP_KERNEL);
> + if (!bs)
> + return -ENOMEM;
> +
> + error = bioset_init(bs, IOMAP_POOL_SIZE,
> + offsetof(struct iomap_fsverity_bio, bio),
> + BIOSET_NEED_BVECS);
> + if (error) {
> + kfree(bs);
> + return error;
> + }
> +
> + /*
> + * This has to be atomic as readaheads can race to create the
> + * bioset. If someone set the pointer before us, we drop ours.
> + */
> + old = cmpxchg(&iomap_fsverity_bioset, NULL, bs);
> + if (old) {
> + bioset_exit(bs);
> + kfree(bs);
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(iomap_fsverity_init_bioset);
> +#else
> +# define iomap_fsverity_init_bioset(...) (-EOPNOTSUPP)
> +#endif
> +
> static int __init iomap_ioend_init(void)
> {
> - return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
> + return bioset_init(&iomap_ioend_bioset, IOMAP_POOL_SIZE,
> offsetof(struct iomap_ioend, io_bio),
> BIOSET_NEED_BVECS);
> }
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 7a7e31c499..b451ab3426 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -342,6 +342,17 @@
> iter->srcmap.type == IOMAP_MAPPED;
> }
>
> +#ifdef CONFIG_FS_VERITY
> +extern struct bio_set *iomap_fsverity_bioset;
> +
> +struct iomap_fsverity_bio {
> + struct work_struct work;
> + struct bio bio;
> +};
> +
> +int iomap_fsverity_init_bioset(void);
> +#endif
> +
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> const struct iomap_ops *ops,
> const struct iomap_write_ops *write_ops, void *private);
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path
2026-01-12 22:35 ` Darrick J. Wong
@ 2026-01-13 11:16 ` Andrey Albershteyn
2026-01-13 16:23 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 11:16 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On 2026-01-12 14:35:55, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:50:26PM +0100, Andrey Albershteyn wrote:
> > This patch adds fs-verity verification into iomap's read path. After
> > BIO's io operation is complete the data are verified against
> > fs-verity's Merkle tree. Verification work is done in a separate
> > workqueue.
> >
> > The read path ioend iomap_read_ioend are stored side by side with
> > BIOs if FS_VERITY is enabled.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/iomap/bio.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++----
> > fs/iomap/buffered-io.c | 12 ++++++++-
> > fs/iomap/ioend.c | 41 +++++++++++++++++++++++++++++++-
> > include/linux/iomap.h | 11 ++++++++
> > 4 files changed, 123 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
> > index fc045f2e4c..ac6c16b1f8 100644
> > --- a/fs/iomap/bio.c
> > +++ b/fs/iomap/bio.c
> > @@ -5,6 +5,7 @@
> > */
> > #include <linux/iomap.h>
> > #include <linux/pagemap.h>
> > +#include <linux/fsverity.h>
> > #include "internal.h"
> > #include "trace.h"
> >
> > @@ -18,6 +19,60 @@
> > bio_put(bio);
> > }
> >
> > +#ifdef CONFIG_FS_VERITY
>
> Should all this stuff go into fs/iomap/fsverity.c instead of ifdef'd
> around the iomap code?
>
> <shrug>
oh, sure, this would be better
>
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 79d1c97f02..481f7e1cff 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -8,6 +8,7 @@
> > #include <linux/writeback.h>
> > #include <linux/swap.h>
> > #include <linux/migrate.h>
> > +#include <linux/fsverity.h>
> > #include "internal.h"
> > #include "trace.h"
> >
> > @@ -532,10 +533,19 @@
> > if (plen == 0)
> > return 0;
> >
> > + /* end of fs-verity region*/
> > + if ((iomap->flags & IOMAP_F_BEYOND_EOF) && (iomap->type == IOMAP_HOLE)) {
>
> Overly long line.
will fix
>
> Also, when do we get the combination of BEYOND_EOF && HOLE? Is that for
> sparse regions in only the merkle tree? IIRC (and I could be wrong)
> fsverity still wants to checksum sparse holes in the regular file data,
> right?
The _BEYOUND_EOF is only for fsverity metadata. This case handles
the merkle tree tail case/end of tree. 1k fs bs 4k page 1k fsverity
blocks, fsverity requests the page with a tree which is smaller than
4 fsverity blocks (e.g. 3072b). The last 1k block in the page will
be hole. So, just zero out the rest and mark uptodate.
>
> > + folio_zero_range(folio, poff, plen);
> > + iomap_set_range_uptodate(folio, poff, plen);
> > + }
> > /* zero post-eof blocks as the page may be mapped */
> > - if (iomap_block_needs_zeroing(iter, pos) &&
> > + else if (iomap_block_needs_zeroing(iter, pos) &&
>
> } else if (...
>
> (nitpicking indentation)
>
> > !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> > folio_zero_range(folio, poff, plen);
> > + if (fsverity_active(iter->inode) &&
> > + !fsverity_verify_blocks(folio, plen, poff)) {
> > + return -EIO;
> > + }
> > iomap_set_range_uptodate(folio, poff, plen);
> > } else {
> > if (!*bytes_submitted)
> > diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> > index 86f44922ed..30c0de3c75 100644
> > --- a/fs/iomap/ioend.c
> > +++ b/fs/iomap/ioend.c
> > @@ -9,6 +9,8 @@
> > #include "internal.h"
> > #include "trace.h"
> >
> > +#define IOMAP_POOL_SIZE (4 * (PAGE_SIZE / SECTOR_SIZE))
>
> How do we arrive at this pool size? How is it important to have a
> larger bio reserve pool for *larger* base page sizes?
Well, this is just a one which iomap uses by default for read pool.
I'm not sure I know enough to optimize pool size here :)
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path
2026-01-13 11:16 ` Andrey Albershteyn
@ 2026-01-13 16:23 ` Christoph Hellwig
0 siblings, 0 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 16:23 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Darrick J. Wong, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Tue, Jan 13, 2026 at 12:16:45PM +0100, Andrey Albershteyn wrote:
> > Also, when do we get the combination of BEYOND_EOF && HOLE? Is that for
> > sparse regions in only the merkle tree? IIRC (and I could be wrong)
> > fsverity still wants to checksum sparse holes in the regular file data,
> > right?
>
> The _BEYOUND_EOF is only for fsverity metadata. This case handles
> the merkle tree tail case/end of tree. 1k fs bs 4k page 1k fsverity
> blocks, fsverity requests the page with a tree which is smaller than
> 4 fsverity blocks (e.g. 3072b). The last 1k block in the page will
> be hole. So, just zero out the rest and mark uptodate.
Can you add a comment explaining this?
> > > +#define IOMAP_POOL_SIZE (4 * (PAGE_SIZE / SECTOR_SIZE))
> >
> > How do we arrive at this pool size? How is it important to have a
> > larger bio reserve pool for *larger* base page sizes?
>
> Well, this is just a one which iomap uses by default for read pool.
> I'm not sure I know enough to optimize pool size here :)
Note that all this would go away when using ioends.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path
2026-01-12 14:50 ` [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
2026-01-12 22:35 ` Darrick J. Wong
@ 2026-01-13 8:19 ` Christoph Hellwig
1 sibling, 0 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:19 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david, hch
On Mon, Jan 12, 2026 at 03:50:26PM +0100, Andrey Albershteyn wrote:
> +#ifdef CONFIG_FS_VERITY
> +static void
> +iomap_read_fsverify_end_io_work(struct work_struct *work)
> +{
> + struct iomap_fsverity_bio *fbio =
> + container_of(work, struct iomap_fsverity_bio, work);
> +
> + fsverity_verify_bio(&fbio->bio);
> + iomap_read_end_io(&fbio->bio);
> +}
I'd much rather use the ioend processing infrastructure for this rather
the reinventing another deferral method. This series:
https://git.infradead.org/?p=users/hch/misc.git;a=shortlog;h=refs/heads/iomap-pi
has the small patches enabling it for reads, and I will post a rebased
and updated version soon.
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 6/22] xfs: add fs-verity ro-compat flag
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (4 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 14:50 ` [PATCH v2 7/22] xfs: add inode on-disk VERITY flag Andrey Albershteyn
` (16 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
To mark inodes with fs-verity enabled the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_format.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 2 ++
fs/xfs/xfs_mount.h | 2 ++
3 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 779dac59b1..64c2acd1cf 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -374,6 +374,7 @@
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3) /* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY (1 << 4) /* fs-verity */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 94c272a2ae..744bd8480b 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -165,6 +165,8 @@
features |= XFS_FEAT_REFLINK;
if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
features |= XFS_FEAT_INOBTCNT;
+ if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+ features |= XFS_FEAT_VERITY;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
features |= XFS_FEAT_FTYPE;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b871dfde37..8ef7fea8b3 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -381,6 +381,7 @@
#define XFS_FEAT_EXCHANGE_RANGE (1ULL << 27) /* exchange range */
#define XFS_FEAT_METADIR (1ULL << 28) /* metadata directory tree */
#define XFS_FEAT_ZONED (1ULL << 29) /* zoned RT device */
+#define XFS_FEAT_VERITY (1ULL << 30) /* fs-verity */
/* Mount features */
#define XFS_FEAT_NOLIFETIME (1ULL << 47) /* disable lifetime hints */
@@ -438,6 +439,7 @@
__XFS_HAS_FEAT(metadir, METADIR)
__XFS_HAS_FEAT(zoned, ZONED)
__XFS_HAS_FEAT(nolifetime, NOLIFETIME)
+__XFS_HAS_FEAT(verity, VERITY)
static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
{
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 7/22] xfs: add inode on-disk VERITY flag
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (5 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 6/22] xfs: add fs-verity ro-compat flag Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 14:50 ` [PATCH v2 8/22] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
` (15 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_format.h | 7 ++++++-
fs/xfs/libxfs/xfs_inode_buf.c | 8 ++++++++
fs/xfs/libxfs/xfs_inode_util.c | 2 ++
fs/xfs/xfs_iops.c | 2 ++
4 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 64c2acd1cf..d67b404964 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1231,16 +1231,21 @@
*/
#define XFS_DIFLAG2_METADATA_BIT 5
+/* inodes sealed with fs-verity */
+#define XFS_DIFLAG2_VERITY_BIT 6
+
#define XFS_DIFLAG2_DAX (1ULL << XFS_DIFLAG2_DAX_BIT)
#define XFS_DIFLAG2_REFLINK (1ULL << XFS_DIFLAG2_REFLINK_BIT)
#define XFS_DIFLAG2_COWEXTSIZE (1ULL << XFS_DIFLAG2_COWEXTSIZE_BIT)
#define XFS_DIFLAG2_BIGTIME (1ULL << XFS_DIFLAG2_BIGTIME_BIT)
#define XFS_DIFLAG2_NREXT64 (1ULL << XFS_DIFLAG2_NREXT64_BIT)
#define XFS_DIFLAG2_METADATA (1ULL << XFS_DIFLAG2_METADATA_BIT)
+#define XFS_DIFLAG2_VERITY (1ULL << XFS_DIFLAG2_VERITY_BIT)
#define XFS_DIFLAG2_ANY \
(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
- XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA)
+ XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA | \
+ XFS_DIFLAG2_VERITY)
static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
{
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index b1812b2c3c..c4fff7a34c 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -756,6 +756,14 @@
!xfs_has_rtreflink(mp))
return __this_address;
+ /* only regular files can have fsverity */
+ if (flags2 & XFS_DIFLAG2_VERITY) {
+ if (!xfs_has_verity(mp))
+ return __this_address;
+ if ((mode & S_IFMT) != S_IFREG)
+ return __this_address;
+ }
+
if (xfs_has_zoned(mp) &&
dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTRMAP)) {
if (be32_to_cpu(dip->di_used_blocks) > mp->m_sb.sb_rgextents)
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 309ce6dd55..aaf51207b2 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -126,6 +126,8 @@
flags |= FS_XFLAG_DAX;
if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
flags |= FS_XFLAG_COWEXTSIZE;
+ if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+ flags |= FS_XFLAG_VERITY;
}
if (xfs_inode_has_attr_fork(ip))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ad94fbf550..6b8e4e87ab 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1394,6 +1394,8 @@
flags |= S_NOATIME;
if (init && xfs_inode_should_enable_dax(ip))
flags |= S_DAX;
+ if (xflags & FS_XFLAG_VERITY)
+ flags |= S_VERITY;
/*
* S_DAX can only be set during inode initialization and is never set by
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 8/22] xfs: initialize fs-verity on file open and cleanup on inode destruction
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (6 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 7/22] xfs: add inode on-disk VERITY flag Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 14:50 ` [PATCH v2 9/22] xfs: don't allow to enable DAX on fs-verity sealed inode Andrey Albershteyn
` (14 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
fs-verity will read and attach metadata (not the tree itself) from
a disk for those inodes which already have fs-verity enabled.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 8 ++++++++
fs/xfs/xfs_super.c | 2 ++
2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 7874cf745a..0c234ab449 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -36,6 +36,7 @@
#include <linux/mman.h>
#include <linux/fadvise.h>
#include <linux/mount.h>
+#include <linux/fsverity.h>
static const struct vm_operations_struct xfs_file_vm_ops;
@@ -1604,11 +1605,18 @@
struct inode *inode,
struct file *file)
{
+ int error;
+
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
if (xfs_get_atomic_write_min(XFS_I(inode)) > 0)
file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
+
+ error = fsverity_file_open(inode, file);
+ if (error)
+ return error;
+
return generic_file_open(inode, file);
}
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index bc71aa9dce..10c6fc8d20 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -53,6 +53,7 @@
#include <linux/magic.h>
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
+#include <linux/fsverity.h>
static const struct super_operations xfs_super_operations;
@@ -709,6 +710,7 @@
ASSERT(!rwsem_is_locked(&inode->i_rwsem));
XFS_STATS_INC(ip->i_mount, vn_rele);
XFS_STATS_INC(ip->i_mount, vn_remove);
+ fsverity_cleanup_inode(inode);
xfs_inode_mark_reclaimable(ip);
}
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 9/22] xfs: don't allow to enable DAX on fs-verity sealed inode
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (7 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 8/22] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
@ 2026-01-12 14:50 ` Andrey Albershteyn
2026-01-12 14:51 ` [PATCH v2 10/22] xfs: disable direct read path for fs-verity files Andrey Albershteyn
` (13 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:50 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
fs-verity doesn't support DAX. Forbid filesystem to enable DAX on
inodes which already have fs-verity enabled. The opposite is checked
when fs-verity is enabled, it won't be enabled if DAX is.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix typo in subject]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_iops.c | 2 ++
1 file changed, 2 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6b8e4e87ab..44c6161137 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1366,6 +1366,8 @@
return false;
if (!xfs_inode_supports_dax(ip))
return false;
+ if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+ return false;
if (xfs_has_dax_always(ip->i_mount))
return true;
if (ip->i_diflags2 & XFS_DIFLAG2_DAX)
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 10/22] xfs: disable direct read path for fs-verity files
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (8 preceding siblings ...)
2026-01-12 14:50 ` [PATCH v2 9/22] xfs: don't allow to enable DAX on fs-verity sealed inode Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-13 8:20 ` Christoph Hellwig
2026-01-12 14:51 ` [PATCH v2 11/22] xfs: add verity info pointer to xfs inode Andrey Albershteyn
` (12 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
The direct path is not supported on verity files. Attempts to use direct
I/O path on such files should fall back to buffered I/O path.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix braces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0c234ab449..87a96b81f0 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -254,7 +254,8 @@
struct kiocb *iocb,
struct iov_iter *to)
{
- struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct xfs_inode *ip = XFS_I(inode);
ssize_t ret = 0;
trace_xfs_file_dax_read(iocb, to);
@@ -307,10 +308,18 @@
if (IS_DAX(inode))
ret = xfs_file_dax_read(iocb, to);
- else if (iocb->ki_flags & IOCB_DIRECT)
+ else if ((iocb->ki_flags & IOCB_DIRECT) && !fsverity_active(inode))
ret = xfs_file_dio_read(iocb, to);
- else
+ else {
+ /*
+ * In case fs-verity is enabled, we also fallback to the
+ * buffered read from the direct read path. Therefore,
+ * IOCB_DIRECT is set and need to be cleared (see
+ * generic_file_read_iter())
+ */
+ iocb->ki_flags &= ~IOCB_DIRECT;
ret = xfs_file_buffered_read(iocb, to);
+ }
if (ret > 0)
XFS_STATS_ADD(mp, xs_read_bytes, ret);
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 10/22] xfs: disable direct read path for fs-verity files
2026-01-12 14:51 ` [PATCH v2 10/22] xfs: disable direct read path for fs-verity files Andrey Albershteyn
@ 2026-01-13 8:20 ` Christoph Hellwig
2026-01-13 11:22 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:20 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david, hch
On Mon, Jan 12, 2026 at 03:51:03PM +0100, Andrey Albershteyn wrote:
> if (IS_DAX(inode))
> ret = xfs_file_dax_read(iocb, to);
> - else if (iocb->ki_flags & IOCB_DIRECT)
> + else if ((iocb->ki_flags & IOCB_DIRECT) && !fsverity_active(inode))
> ret = xfs_file_dio_read(iocb, to);
> - else
> + else {
> + /*
> + * In case fs-verity is enabled, we also fallback to the
> + * buffered read from the direct read path. Therefore,
> + * IOCB_DIRECT is set and need to be cleared (see
> + * generic_file_read_iter())
> + */
> + iocb->ki_flags &= ~IOCB_DIRECT;
> ret = xfs_file_buffered_read(iocb, to);
> + }
I think this might actuall be easier as:
if (fsverity_active(inode))
iocb->ki_flags &= ~IOCB_DIRECT;
...
<existing if/else>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 10/22] xfs: disable direct read path for fs-verity files
2026-01-13 8:20 ` Christoph Hellwig
@ 2026-01-13 11:22 ` Andrey Albershteyn
0 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 11:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david
On 2026-01-13 09:20:39, Christoph Hellwig wrote:
> On Mon, Jan 12, 2026 at 03:51:03PM +0100, Andrey Albershteyn wrote:
> > if (IS_DAX(inode))
> > ret = xfs_file_dax_read(iocb, to);
> > - else if (iocb->ki_flags & IOCB_DIRECT)
> > + else if ((iocb->ki_flags & IOCB_DIRECT) && !fsverity_active(inode))
> > ret = xfs_file_dio_read(iocb, to);
> > - else
> > + else {
> > + /*
> > + * In case fs-verity is enabled, we also fallback to the
> > + * buffered read from the direct read path. Therefore,
> > + * IOCB_DIRECT is set and need to be cleared (see
> > + * generic_file_read_iter())
> > + */
> > + iocb->ki_flags &= ~IOCB_DIRECT;
> > ret = xfs_file_buffered_read(iocb, to);
> > + }
>
> I think this might actuall be easier as:
>
> if (fsverity_active(inode))
> iocb->ki_flags &= ~IOCB_DIRECT;
>
> ...
> <existing if/else>
>
sure, thanks!
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 11/22] xfs: add verity info pointer to xfs inode
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (9 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 10/22] xfs: disable direct read path for fs-verity files Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 22:39 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag Andrey Albershteyn
` (11 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Add the fsverity_info pointer into the filesystem-specific part of the
inode by adding the field xfs_inode->i_verity_info and configuring
fsverity_operations::inode_info_offs accordingly.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_icache.c | 3 +++
fs/xfs/xfs_inode.h | 5 +++++
2 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 23a920437f..872785c68a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -130,6 +130,9 @@
spin_lock_init(&ip->i_ioend_lock);
ip->i_next_unlinked = NULLAGINO;
ip->i_prev_unlinked = 0;
+#ifdef CONFIG_FS_VERITY
+ ip->i_verity_info = NULL;
+#endif
return ip;
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index bd6d335571..f149cb1eb5 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -9,6 +9,7 @@
#include "xfs_inode_buf.h"
#include "xfs_inode_fork.h"
#include "xfs_inode_util.h"
+#include <linux/fsverity.h>
/*
* Kernel only inode definitions
@@ -99,6 +100,10 @@
spinlock_t i_ioend_lock;
struct work_struct i_ioend_work;
struct list_head i_ioend_list;
+
+#ifdef CONFIG_FS_VERITY
+ struct fsverity_info *i_verity_info;
+#endif
} xfs_inode_t;
static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip)
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 11/22] xfs: add verity info pointer to xfs inode
2026-01-12 14:51 ` [PATCH v2 11/22] xfs: add verity info pointer to xfs inode Andrey Albershteyn
@ 2026-01-12 22:39 ` Darrick J. Wong
2026-01-13 8:21 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:39 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:51:10PM +0100, Andrey Albershteyn wrote:
> Add the fsverity_info pointer into the filesystem-specific part of the
> inode by adding the field xfs_inode->i_verity_info and configuring
> fsverity_operations::inode_info_offs accordingly.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
I kinda don't like adding another pointer to struct xfs_inode but I
can't see a better solution. Inodes can add or drop verity status
dynamically so we can't do this trick:
struct xfs_verity_inode {
struct xfs_inode hork;
struct fsverity_info dork;
};
and only allocate the xfs_verity_inode when needed. Therefore,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_icache.c | 3 +++
> fs/xfs/xfs_inode.h | 5 +++++
> 2 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 23a920437f..872785c68a 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -130,6 +130,9 @@
> spin_lock_init(&ip->i_ioend_lock);
> ip->i_next_unlinked = NULLAGINO;
> ip->i_prev_unlinked = 0;
> +#ifdef CONFIG_FS_VERITY
> + ip->i_verity_info = NULL;
> +#endif
>
> return ip;
> }
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index bd6d335571..f149cb1eb5 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -9,6 +9,7 @@
> #include "xfs_inode_buf.h"
> #include "xfs_inode_fork.h"
> #include "xfs_inode_util.h"
> +#include <linux/fsverity.h>
>
> /*
> * Kernel only inode definitions
> @@ -99,6 +100,10 @@
> spinlock_t i_ioend_lock;
> struct work_struct i_ioend_work;
> struct list_head i_ioend_list;
> +
> +#ifdef CONFIG_FS_VERITY
> + struct fsverity_info *i_verity_info;
> +#endif
> } xfs_inode_t;
>
> static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip)
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 11/22] xfs: add verity info pointer to xfs inode
2026-01-12 22:39 ` Darrick J. Wong
@ 2026-01-13 8:21 ` Christoph Hellwig
2026-01-13 18:02 ` Darrick J. Wong
0 siblings, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:21 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Mon, Jan 12, 2026 at 02:39:38PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:51:10PM +0100, Andrey Albershteyn wrote:
> > Add the fsverity_info pointer into the filesystem-specific part of the
> > inode by adding the field xfs_inode->i_verity_info and configuring
> > fsverity_operations::inode_info_offs accordingly.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
>
> I kinda don't like adding another pointer to struct xfs_inode
Me neither.
> but I can't see a better solution.
Well, I suggested just making the fsverity_info a standalone object,
and looking it up using a rhastable based on the ino. Eric didn't
seem overly existed about that, but I think it would be really helpful
to not bloat all the common inodes for fsverity.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 11/22] xfs: add verity info pointer to xfs inode
2026-01-13 8:21 ` Christoph Hellwig
@ 2026-01-13 18:02 ` Darrick J. Wong
2026-01-14 6:43 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-13 18:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On Tue, Jan 13, 2026 at 09:21:58AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 12, 2026 at 02:39:38PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 12, 2026 at 03:51:10PM +0100, Andrey Albershteyn wrote:
> > > Add the fsverity_info pointer into the filesystem-specific part of the
> > > inode by adding the field xfs_inode->i_verity_info and configuring
> > > fsverity_operations::inode_info_offs accordingly.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> >
> > I kinda don't like adding another pointer to struct xfs_inode
>
> Me neither.
>
> > but I can't see a better solution.
>
> Well, I suggested just making the fsverity_info a standalone object,
> and looking it up using a rhastable based on the ino. Eric didn't
> seem overly existed about that, but I think it would be really helpful
> to not bloat all the common inodes for fsverity.
Oh, right, now I remember you asking about using rhashtable to map an
xfs_inode to a fsverity_info. I wonder how much of a performance
overhead that lookup (vs. the pointer deref that's used now) would have
for loading folios into the pagecache.
--D
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 11/22] xfs: add verity info pointer to xfs inode
2026-01-13 18:02 ` Darrick J. Wong
@ 2026-01-14 6:43 ` Christoph Hellwig
0 siblings, 0 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-14 6:43 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, fsverity, linux-xfs,
ebiggers, linux-fsdevel, aalbersh, david
On Tue, Jan 13, 2026 at 10:02:13AM -0800, Darrick J. Wong wrote:
> > Well, I suggested just making the fsverity_info a standalone object,
> > and looking it up using a rhastable based on the ino. Eric didn't
> > seem overly existed about that, but I think it would be really helpful
> > to not bloat all the common inodes for fsverity.
>
> Oh, right, now I remember you asking about using rhashtable to map an
> xfs_inode to a fsverity_info. I wonder how much of a performance
> overhead that lookup (vs. the pointer deref that's used now) would have
> for loading folios into the pagecache.
If we have to do that on every folio it would suck. But if we rework
the code to only do it once per operation it should be close to
noise.
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (10 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 11/22] xfs: add verity info pointer to xfs inode Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 22:42 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant Andrey Albershteyn
` (10 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Add new flag meaning that merkle tree is being build on the inode.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_inode.h | 6 ++++++
1 file changed, 6 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index f149cb1eb5..9373bf146a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -420,6 +420,12 @@
*/
#define XFS_IREMAPPING (1U << 15)
+/*
+ * fs-verity's Merkle tree is under construction. The file is read-only, the
+ * only writes happening is the ones with Merkle tree blocks.
+ */
+#define XFS_VERITY_CONSTRUCTION (1U << 16)
+
/* All inode state flags related to inode reclaim. */
#define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
XFS_IRECLAIM | \
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag
2026-01-12 14:51 ` [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag Andrey Albershteyn
@ 2026-01-12 22:42 ` Darrick J. Wong
2026-01-13 11:24 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:42 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:51:16PM +0100, Andrey Albershteyn wrote:
> Add new flag meaning that merkle tree is being build on the inode.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Seems fine to me, with one nit below
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_inode.h | 6 ++++++
> 1 file changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index f149cb1eb5..9373bf146a 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -420,6 +420,12 @@
> */
> #define XFS_IREMAPPING (1U << 15)
>
> +/*
> + * fs-verity's Merkle tree is under construction. The file is read-only, the
> + * only writes happening is the ones with Merkle tree blocks.
"..the only writes happening are for the fsverity metadata."
since fsverity could in theory someday write more than just a merkle
tree and a descriptor.
--D
> + */
> +#define XFS_VERITY_CONSTRUCTION (1U << 16)
> +
> /* All inode state flags related to inode reclaim. */
> #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> XFS_IRECLAIM | \
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag
2026-01-12 22:42 ` Darrick J. Wong
@ 2026-01-13 11:24 ` Andrey Albershteyn
0 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 11:24 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On 2026-01-12 14:42:13, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:51:16PM +0100, Andrey Albershteyn wrote:
> > Add new flag meaning that merkle tree is being build on the inode.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
>
> Seems fine to me, with one nit below
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Thanks!
>
> --D
>
> > ---
> > fs/xfs/xfs_inode.h | 6 ++++++
> > 1 file changed, 6 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > index f149cb1eb5..9373bf146a 100644
> > --- a/fs/xfs/xfs_inode.h
> > +++ b/fs/xfs/xfs_inode.h
> > @@ -420,6 +420,12 @@
> > */
> > #define XFS_IREMAPPING (1U << 15)
> >
> > +/*
> > + * fs-verity's Merkle tree is under construction. The file is read-only, the
> > + * only writes happening is the ones with Merkle tree blocks.
>
> "..the only writes happening are for the fsverity metadata."
>
> since fsverity could in theory someday write more than just a merkle
> tree and a descriptor.
changed
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (11 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 22:46 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 14/22] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
` (9 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
This constant defines location of fsverity metadata in page cache of
an inode.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/libxfs/xfs_fs.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 12463ba766..b73458a7c2 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -1106,4 +1106,26 @@
#define BBTOB(bbs) ((bbs) << BBSHIFT)
#endif
+/* Merkle tree location in page cache. We take memory region from the inode's
+ * address space for Merkle tree.
+ *
+ * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
+ * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
+ * bits address space.
+ *
+ * At this Merkle tree size we can cover 295EB large file. This is much larger
+ * than the currently supported file size.
+ *
+ * For sha512 the largest file we can cover ends at 1 << 50 offset, this is also
+ * good.
+ *
+ * The metadata is stored on disk as follows:
+ *
+ * [merkle tree...][descriptor.............desc_size]
+ * ^ (1 << 53) ^ (block border) ^ (end of the block)
+ * ^--------------------------------^
+ * Can be FS_VERITY_MAX_DESCRIPTOR_SIZE
+ */
+#define XFS_FSVERITY_REGION_START (1ULL << 53)
+
#endif /* __XFS_FS_H__ */
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-12 14:51 ` [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant Andrey Albershteyn
@ 2026-01-12 22:46 ` Darrick J. Wong
2026-01-13 12:23 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:46 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:51:25PM +0100, Andrey Albershteyn wrote:
> This constant defines location of fsverity metadata in page cache of
> an inode.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/libxfs/xfs_fs.h | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 12463ba766..b73458a7c2 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -1106,4 +1106,26 @@
> #define BBTOB(bbs) ((bbs) << BBSHIFT)
> #endif
>
> +/* Merkle tree location in page cache. We take memory region from the inode's
Dumb nit: new line after opening the multiline comment.
/*
* Merkle tree location in page cache...
also, isn't (1U<<53) the location of the Merkle tree ondisk in addition
to its location in the page cache?
That occurs to me, what happens on 32-bit systems where the pagecache
can only address up to 16T of data? Maybe we just don't allow fsverity
on 32-bit xfs.
> + * address space for Merkle tree.
> + *
> + * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
> + * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
> + * bits address space.
> + *
> + * At this Merkle tree size we can cover 295EB large file. This is much larger
> + * than the currently supported file size.
> + *
> + * For sha512 the largest file we can cover ends at 1 << 50 offset, this is also
> + * good.
> + *
> + * The metadata is stored on disk as follows:
> + *
> + * [merkle tree...][descriptor.............desc_size]
> + * ^ (1 << 53) ^ (block border) ^ (end of the block)
> + * ^--------------------------------^
> + * Can be FS_VERITY_MAX_DESCRIPTOR_SIZE
> + */
> +#define XFS_FSVERITY_REGION_START (1ULL << 53)
Is this in fsblocks or in bytes? I think the comment should state that
explicitly.
--D
> +
> #endif /* __XFS_FS_H__ */
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-12 22:46 ` Darrick J. Wong
@ 2026-01-13 12:23 ` Andrey Albershteyn
2026-01-13 18:06 ` Darrick J. Wong
0 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 12:23 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On 2026-01-12 14:46:31, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:51:25PM +0100, Andrey Albershteyn wrote:
> > This constant defines location of fsverity metadata in page cache of
> > an inode.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/xfs/libxfs/xfs_fs.h | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 12463ba766..b73458a7c2 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -1106,4 +1106,26 @@
> > #define BBTOB(bbs) ((bbs) << BBSHIFT)
> > #endif
> >
> > +/* Merkle tree location in page cache. We take memory region from the inode's
>
> Dumb nit: new line after opening the multiline comment.
>
> /*
> * Merkle tree location in page cache...
>
> also, isn't (1U<<53) the location of the Merkle tree ondisk in addition
> to its location in the page cache?
yes, it's file offset
>
> That occurs to me, what happens on 32-bit systems where the pagecache
> can only address up to 16T of data? Maybe we just don't allow fsverity
> on 32-bit xfs.
hmm right, check in begin_enable() will be probably enough
>
> > + * address space for Merkle tree.
> > + *
> > + * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
> > + * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
> > + * bits address space.
> > + *
> > + * At this Merkle tree size we can cover 295EB large file. This is much larger
> > + * than the currently supported file size.
> > + *
> > + * For sha512 the largest file we can cover ends at 1 << 50 offset, this is also
> > + * good.
> > + *
> > + * The metadata is stored on disk as follows:
> > + *
> > + * [merkle tree...][descriptor.............desc_size]
> > + * ^ (1 << 53) ^ (block border) ^ (end of the block)
> > + * ^--------------------------------^
> > + * Can be FS_VERITY_MAX_DESCRIPTOR_SIZE
> > + */
> > +#define XFS_FSVERITY_REGION_START (1ULL << 53)
>
> Is this in fsblocks or in bytes? I think the comment should state that
> explicitly.
sure, will add it
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-13 12:23 ` Andrey Albershteyn
@ 2026-01-13 18:06 ` Darrick J. Wong
2026-01-14 6:47 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-13 18:06 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Tue, Jan 13, 2026 at 01:23:06PM +0100, Andrey Albershteyn wrote:
> On 2026-01-12 14:46:31, Darrick J. Wong wrote:
> > On Mon, Jan 12, 2026 at 03:51:25PM +0100, Andrey Albershteyn wrote:
> > > This constant defines location of fsverity metadata in page cache of
> > > an inode.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/xfs/libxfs/xfs_fs.h | 22 ++++++++++++++++++++++
> > > 1 file changed, 22 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > > index 12463ba766..b73458a7c2 100644
> > > --- a/fs/xfs/libxfs/xfs_fs.h
> > > +++ b/fs/xfs/libxfs/xfs_fs.h
> > > @@ -1106,4 +1106,26 @@
> > > #define BBTOB(bbs) ((bbs) << BBSHIFT)
> > > #endif
> > >
> > > +/* Merkle tree location in page cache. We take memory region from the inode's
> >
> > Dumb nit: new line after opening the multiline comment.
> >
> > /*
> > * Merkle tree location in page cache...
> >
> > also, isn't (1U<<53) the location of the Merkle tree ondisk in addition
> > to its location in the page cache?
>
> yes, it's file offset
>
> >
> > That occurs to me, what happens on 32-bit systems where the pagecache
> > can only address up to 16T of data? Maybe we just don't allow fsverity
> > on 32-bit xfs.
>
> hmm right, check in begin_enable() will be probably enough
I think that would probably be more of a mount-time prohibition?
Which would be worse -- any fsverity filesystem refuses to mount on
32-bit; or it mounts but none of the fsverity files are readable?
Alternately I guess for 32-bit you could cheat in ->iomap_begin
by loading the fsverity artifacts into the pagecache at 1<<39 instead of
1<<53, provided the file is smaller than 1<<39 bytes. Writing the
fsverity metadata would perform the reverse translation.
(Or again we just don't allow mounting of fsverity on 32-bit kernels.)
--D
> > > + * address space for Merkle tree.
> > > + *
> > > + * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
> > > + * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
> > > + * bits address space.
> > > + *
> > > + * At this Merkle tree size we can cover 295EB large file. This is much larger
> > > + * than the currently supported file size.
> > > + *
> > > + * For sha512 the largest file we can cover ends at 1 << 50 offset, this is also
> > > + * good.
> > > + *
> > > + * The metadata is stored on disk as follows:
> > > + *
> > > + * [merkle tree...][descriptor.............desc_size]
> > > + * ^ (1 << 53) ^ (block border) ^ (end of the block)
> > > + * ^--------------------------------^
> > > + * Can be FS_VERITY_MAX_DESCRIPTOR_SIZE
> > > + */
> > > +#define XFS_FSVERITY_REGION_START (1ULL << 53)
> >
> > Is this in fsblocks or in bytes? I think the comment should state that
> > explicitly.
>
> sure, will add it
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-13 18:06 ` Darrick J. Wong
@ 2026-01-14 6:47 ` Christoph Hellwig
2026-01-14 7:59 ` Andrey Albershteyn
2026-01-14 16:50 ` Darrick J. Wong
0 siblings, 2 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-14 6:47 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Tue, Jan 13, 2026 at 10:06:55AM -0800, Darrick J. Wong wrote:
> > hmm right, check in begin_enable() will be probably enough
>
> I think that would probably be more of a mount-time prohibition?
>
> Which would be worse -- any fsverity filesystem refuses to mount on
> 32-bit; or it mounts but none of the fsverity files are readable?
>
> Alternately I guess for 32-bit you could cheat in ->iomap_begin
> by loading the fsverity artifacts into the pagecache at 1<<39 instead of
> 1<<53, provided the file is smaller than 1<<39 bytes. Writing the
> fsverity metadata would perform the reverse translation.
>
> (Or again we just don't allow mounting of fsverity on 32-bit kernels.)
What are the other file systems doing here? Unless we have a good
reason to differ we should just follow the precedence.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-14 6:47 ` Christoph Hellwig
@ 2026-01-14 7:59 ` Andrey Albershteyn
2026-01-14 16:50 ` Darrick J. Wong
1 sibling, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-14 7:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On 2026-01-14 07:47:35, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 10:06:55AM -0800, Darrick J. Wong wrote:
> > > hmm right, check in begin_enable() will be probably enough
> >
> > I think that would probably be more of a mount-time prohibition?
> >
> > Which would be worse -- any fsverity filesystem refuses to mount on
> > 32-bit; or it mounts but none of the fsverity files are readable?
> >
> > Alternately I guess for 32-bit you could cheat in ->iomap_begin
> > by loading the fsverity artifacts into the pagecache at 1<<39 instead of
> > 1<<53, provided the file is smaller than 1<<39 bytes. Writing the
> > fsverity metadata would perform the reverse translation.
> >
> > (Or again we just don't allow mounting of fsverity on 32-bit kernels.)
>
> What are the other file systems doing here? Unless we have a good
> reason to differ we should just follow the precedence.
>
The ext4/f2fs/btrfs does this
round_up(inode->i_size, 65536);
It's not fixed offset but next block after EOF. As EOF won't move
this is fine. So, this works for both 32/64bits
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant
2026-01-14 6:47 ` Christoph Hellwig
2026-01-14 7:59 ` Andrey Albershteyn
@ 2026-01-14 16:50 ` Darrick J. Wong
1 sibling, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-14 16:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On Wed, Jan 14, 2026 at 07:47:35AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 10:06:55AM -0800, Darrick J. Wong wrote:
> > > hmm right, check in begin_enable() will be probably enough
> >
> > I think that would probably be more of a mount-time prohibition?
> >
> > Which would be worse -- any fsverity filesystem refuses to mount on
> > 32-bit; or it mounts but none of the fsverity files are readable?
> >
> > Alternately I guess for 32-bit you could cheat in ->iomap_begin
> > by loading the fsverity artifacts into the pagecache at 1<<39 instead of
> > 1<<53, provided the file is smaller than 1<<39 bytes. Writing the
> > fsverity metadata would perform the reverse translation.
> >
> > (Or again we just don't allow mounting of fsverity on 32-bit kernels.)
>
> What are the other file systems doing here? Unless we have a good
> reason to differ we should just follow the precedence.
I ext4/f2fs place the fsverity metadata at roundup(i_size, 64k) and who
knows what happens if any of them ever enable large folios or have a
gigantic base page size (looking at you, arch/hexagon/).
btrfs seems to persist the merkle tree elsewhere but it loads it into
the pagecache at the same rounded-up address.
--D
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 14/22] xfs: disable preallocations for fsverity Merkle tree writes
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (12 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 22:49 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages Andrey Albershteyn
` (8 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
While writing Merkle tree, file is read-only and there's no further
writes except Merkle tree building. The file is truncated beforehand to
remove any preallocated extents.
The Merkle tree is the only data XFS will write. As we don't want XFS to
truncate file after we done writing, let's also skip truncation on
fsverity files. Therefore, we also need to disable preallocations while
writing merkle tree as we don't want any unused extents past the tree.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_iomap.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 04f39ea158..61aab5617f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1938,7 +1938,9 @@
* Determine the initial size of the preallocation.
* We clean up any extra preallocation when the file is closed.
*/
- if (xfs_has_allocsize(mp))
+ if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
+ prealloc_blocks = 0;
+ else if (xfs_has_allocsize(mp))
prealloc_blocks = mp->m_allocsize_blocks;
else if (allocfork == XFS_DATA_FORK)
prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
@@ -2065,6 +2067,13 @@
if (flags & IOMAP_FAULT)
return 0;
+ /*
+ * While writing Merkle tree to disk we would not have any other
+ * delayed allocations
+ */
+ if (xfs_iflags_test(XFS_I(inode), XFS_VERITY_CONSTRUCTION))
+ return 0;
+
/* Nothing to do if we've written the entire delalloc extent */
start_byte = iomap_last_written_block(inode, offset, written);
end_byte = round_up(offset + length, i_blocksize(inode));
@@ -2109,6 +2118,7 @@
bool shared = false;
unsigned int lockmode = XFS_ILOCK_SHARED;
u64 seq;
+ int iomap_flags;
ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
@@ -2128,8 +2138,20 @@
if (error)
return error;
trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, flags,
- shared ? IOMAP_F_SHARED : 0, seq);
+ iomap_flags = shared ? IOMAP_F_SHARED : 0;
+
+ /*
+ * We can not use fsverity_active() here. fsverity_active() checks for
+ * verity info attached to inode. This info is based on data from a
+ * verity descriptor. But to read verity descriptor we need to go
+ * through read iomap path (this function). So, when descriptor is read
+ * we will not set IOMAP_F_BEYOND_EOF and descriptor page will be empty
+ * (post EOF hole).
+ */
+ if ((offset >= XFS_FSVERITY_REGION_START) && IS_VERITY(inode))
+ iomap_flags |= IOMAP_F_BEYOND_EOF;
+
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
}
const struct iomap_ops xfs_read_iomap_ops = {
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 14/22] xfs: disable preallocations for fsverity Merkle tree writes
2026-01-12 14:51 ` [PATCH v2 14/22] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
@ 2026-01-12 22:49 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:49 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:51:30PM +0100, Andrey Albershteyn wrote:
> While writing Merkle tree, file is read-only and there's no further
> writes except Merkle tree building. The file is truncated beforehand to
> remove any preallocated extents.
>
> The Merkle tree is the only data XFS will write. As we don't want XFS to
> truncate file after we done writing, let's also skip truncation on
> fsverity files. Therefore, we also need to disable preallocations while
> writing merkle tree as we don't want any unused extents past the tree.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/xfs_iomap.c | 28 +++++++++++++++++++++++++---
> 1 file changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 04f39ea158..61aab5617f 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1938,7 +1938,9 @@
> * Determine the initial size of the preallocation.
> * We clean up any extra preallocation when the file is closed.
> */
> - if (xfs_has_allocsize(mp))
> + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
> + prealloc_blocks = 0;
> + else if (xfs_has_allocsize(mp))
> prealloc_blocks = mp->m_allocsize_blocks;
> else if (allocfork == XFS_DATA_FORK)
> prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
> @@ -2065,6 +2067,13 @@
> if (flags & IOMAP_FAULT)
> return 0;
>
> + /*
> + * While writing Merkle tree to disk we would not have any other
> + * delayed allocations
> + */
> + if (xfs_iflags_test(XFS_I(inode), XFS_VERITY_CONSTRUCTION))
> + return 0;
> +
> /* Nothing to do if we've written the entire delalloc extent */
> start_byte = iomap_last_written_block(inode, offset, written);
> end_byte = round_up(offset + length, i_blocksize(inode));
> @@ -2109,6 +2118,7 @@
> bool shared = false;
> unsigned int lockmode = XFS_ILOCK_SHARED;
> u64 seq;
> + int iomap_flags;
>
> ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
>
> @@ -2128,8 +2138,20 @@
> if (error)
> return error;
> trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> - return xfs_bmbt_to_iomap(ip, iomap, &imap, flags,
> - shared ? IOMAP_F_SHARED : 0, seq);
> + iomap_flags = shared ? IOMAP_F_SHARED : 0;
> +
> + /*
> + * We can not use fsverity_active() here. fsverity_active() checks for
> + * verity info attached to inode. This info is based on data from a
> + * verity descriptor. But to read verity descriptor we need to go
> + * through read iomap path (this function). So, when descriptor is read
> + * we will not set IOMAP_F_BEYOND_EOF and descriptor page will be empty
> + * (post EOF hole).
> + */
> + if ((offset >= XFS_FSVERITY_REGION_START) && IS_VERITY(inode))
(Ok so XFS_FSVERITY_REGION_START is in units of bytes. Maybe cast it to
loff_t to make that clearer?)
#define XFS_FSVERITY_REGION_START ((loff_t)1ULL << 53)
> + iomap_flags |= IOMAP_F_BEYOND_EOF;
Ok, this answers my question. IOMAP_F_BEYOND_EOF is indeed only
intended to be set for mappings who are completely beyond EOF. IOWs,
for the fsverity metadata but not the regular file data.
Looks fine to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> +
> + return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
> }
>
> const struct iomap_ops xfs_read_iomap_ops = {
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (13 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 14/22] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 22:51 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 16/22] xfs: add fs-verity support Andrey Albershteyn
` (7 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
In the writeback path use unbound write interface, meaning that inode
size is not updated and none of the file size checks are applied.
In read path let iomap know that data is stored beyond EOF via flag.
This leads to skipping of post EOF zeroing.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_aops.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 56a5446384..30e38d5322 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -22,6 +22,7 @@
#include "xfs_icache.h"
#include "xfs_zone_alloc.h"
#include "xfs_rtgroup.h"
+#include "xfs_fsverity.h"
struct xfs_writepage_ctx {
struct iomap_writepage_ctx ctx;
@@ -334,6 +335,7 @@
int retries = 0;
int error = 0;
unsigned int *seq;
+ unsigned int iomap_flags = 0;
if (xfs_is_shutdown(mp))
return -EIO;
@@ -427,7 +429,9 @@
isnullstartblock(imap.br_startblock))
goto allocate_blocks;
- xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, 0, XFS_WPC(wpc)->data_seq);
+ if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
+ iomap_flags |= IOMAP_F_BEYOND_EOF;
+ xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, iomap_flags, XFS_WPC(wpc)->data_seq);
trace_xfs_map_blocks_found(ip, offset, count, whichfork, &imap);
return 0;
allocate_blocks:
@@ -470,6 +474,9 @@
wpc->iomap.length = cow_offset - wpc->iomap.offset;
}
+ if (offset >= XFS_FSVERITY_REGION_START)
+ wpc->iomap.flags |= IOMAP_F_BEYOND_EOF;
+
ASSERT(wpc->iomap.offset <= offset);
ASSERT(wpc->iomap.offset + wpc->iomap.length > offset);
trace_xfs_map_blocks_alloc(ip, offset, count, whichfork, &imap);
@@ -698,6 +705,17 @@
},
};
+ if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION)) {
+ wbc->range_start = XFS_FSVERITY_REGION_START;
+ wbc->range_end = LLONG_MAX;
+ wbc->nr_to_write = LONG_MAX;
+ /*
+ * Set IOMAP_F_BEYOND_EOF to skip initial EOF check
+ * The following iomap->flags would be set in
+ * xfs_map_blocks()
+ */
+ wpc.ctx.iomap.flags |= IOMAP_F_BEYOND_EOF;
+ }
return iomap_writepages(&wpc.ctx);
}
}
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages
2026-01-12 14:51 ` [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages Andrey Albershteyn
@ 2026-01-12 22:51 ` Darrick J. Wong
2026-01-13 8:23 ` Christoph Hellwig
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 22:51 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:51:36PM +0100, Andrey Albershteyn wrote:
> In the writeback path use unbound write interface, meaning that inode
> size is not updated and none of the file size checks are applied.
>
> In read path let iomap know that data is stored beyond EOF via flag.
> This leads to skipping of post EOF zeroing.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/xfs_aops.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 56a5446384..30e38d5322 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -22,6 +22,7 @@
> #include "xfs_icache.h"
> #include "xfs_zone_alloc.h"
> #include "xfs_rtgroup.h"
> +#include "xfs_fsverity.h"
>
> struct xfs_writepage_ctx {
> struct iomap_writepage_ctx ctx;
> @@ -334,6 +335,7 @@
> int retries = 0;
> int error = 0;
> unsigned int *seq;
> + unsigned int iomap_flags = 0;
>
> if (xfs_is_shutdown(mp))
> return -EIO;
> @@ -427,7 +429,9 @@
> isnullstartblock(imap.br_startblock))
> goto allocate_blocks;
>
> - xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, 0, XFS_WPC(wpc)->data_seq);
> + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
> + iomap_flags |= IOMAP_F_BEYOND_EOF;
> + xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, iomap_flags, XFS_WPC(wpc)->data_seq);
> trace_xfs_map_blocks_found(ip, offset, count, whichfork, &imap);
> return 0;
> allocate_blocks:
> @@ -470,6 +474,9 @@
> wpc->iomap.length = cow_offset - wpc->iomap.offset;
> }
>
> + if (offset >= XFS_FSVERITY_REGION_START)
> + wpc->iomap.flags |= IOMAP_F_BEYOND_EOF;
> +
> ASSERT(wpc->iomap.offset <= offset);
> ASSERT(wpc->iomap.offset + wpc->iomap.length > offset);
> trace_xfs_map_blocks_alloc(ip, offset, count, whichfork, &imap);
> @@ -698,6 +705,17 @@
> },
> };
>
> + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION)) {
> + wbc->range_start = XFS_FSVERITY_REGION_START;
> + wbc->range_end = LLONG_MAX;
> + wbc->nr_to_write = LONG_MAX;
> + /*
> + * Set IOMAP_F_BEYOND_EOF to skip initial EOF check
> + * The following iomap->flags would be set in
> + * xfs_map_blocks()
> + */
> + wpc.ctx.iomap.flags |= IOMAP_F_BEYOND_EOF;
But won't xfs_map_blocks reset wpc.ctx.iomap.flags?
/me realizes that you /are/ using writeback for writing the fsverity
metadata now, so he'll go back and look at the iomap patches a little
closer.
--D
> + }
> return iomap_writepages(&wpc.ctx);
> }
> }
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages
2026-01-12 22:51 ` Darrick J. Wong
@ 2026-01-13 8:23 ` Christoph Hellwig
2026-01-13 12:31 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-13 8:23 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Mon, Jan 12, 2026 at 02:51:21PM -0800, Darrick J. Wong wrote:
> > + wpc.ctx.iomap.flags |= IOMAP_F_BEYOND_EOF;
>
> But won't xfs_map_blocks reset wpc.ctx.iomap.flags?
>
> /me realizes that you /are/ using writeback for writing the fsverity
> metadata now, so he'll go back and look at the iomap patches a little
> closer.
The real question to me is why we're doing that, instead of doing a
simple direct I/O-style I/O (or even doing actual dio using a bvec
buffer)?
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages
2026-01-13 8:23 ` Christoph Hellwig
@ 2026-01-13 12:31 ` Andrey Albershteyn
0 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 12:31 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david
On 2026-01-13 09:23:17, Christoph Hellwig wrote:
> On Mon, Jan 12, 2026 at 02:51:21PM -0800, Darrick J. Wong wrote:
> > > + wpc.ctx.iomap.flags |= IOMAP_F_BEYOND_EOF;
> >
> > But won't xfs_map_blocks reset wpc.ctx.iomap.flags?
It will, this one is only for the initial run. This is what allows
to pass that check in iomap_writeback_folio(), the
iomap_writeback_handle_eof() call. Further folios would have this
flag set in xfs_map_blocks().
> >
> > /me realizes that you /are/ using writeback for writing the fsverity
> > metadata now, so he'll go back and look at the iomap patches a little
> > closer.
>
> The real question to me is why we're doing that, instead of doing a
> simple direct I/O-style I/O (or even doing actual dio using a bvec
> buffer)?
>
fs-verity uses page cache for caching verified status via setting
flag on the verified pages.
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 16/22] xfs: add fs-verity support
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (14 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 23:05 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 17/22] xfs: add fs-verity ioctls Andrey Albershteyn
` (6 subsequent siblings)
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Add integration with fs-verity. XFS stores fs-verity descriptor and
Merkle tree in the inode data fork at offset file offset (1 << 53).
The Merkle tree reading/writing is done through iomap interface. The
data itself are read to the inode's page cache. When XFS reads from this
region iomap doesn't call into fsverity to verify it against Merkle
tree. For data, verification is done on BIO completion in a workqueue.
When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
flag is set meaning that the Merkle tree is being build. The
initialization ends with storing of verity descriptor and setting
inode on-disk flag (XFS_DIFLAG2_VERITY).
The descriptor is stored in a new block after the last Merkle tree
block. The size of the descriptor is stored at the end of the last
descriptor block (descriptor can be multiple blocks).
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_bmap_util.c | 7 +
fs/xfs/xfs_fsverity.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_fsverity.h | 12 +
fs/xfs/xfs_message.c | 4 +
fs/xfs/xfs_message.h | 1 +
fs/xfs/xfs_super.c | 14 +
7 files changed, 415 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5bf501cf82..ad66439db7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -147,6 +147,7 @@
xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
+xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
# notify failure
ifeq ($(CONFIG_MEMORY_FAILURE),y)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2208a720ec..79a255a3ac 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -31,6 +31,7 @@
#include "xfs_rtbitmap.h"
#include "xfs_rtgroup.h"
#include "xfs_zone_alloc.h"
+#include <linux/fsverity.h>
/* Kernel only BMAP related definitions and functions */
@@ -554,6 +555,12 @@
return false;
/*
+ * Nothing to clean on fsverity inodes as they are read-only
+ */
+ if (IS_VERITY(VFS_I(ip)))
+ return false;
+
+ /*
* Check if there is an post-EOF extent to free. If there are any
* delalloc blocks attached to the inode (data fork delalloc
* reservations or CoW extents of any kind), we need to free them so
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
new file mode 100644
index 0000000000..691dc60778
--- /dev/null
+++ b/fs/xfs/xfs_fsverity.c
@@ -0,0 +1,376 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2025 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_inode.h"
+#include "xfs_log_format.h"
+#include "xfs_bmap_util.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_quota.h"
+#include "xfs_fsverity.h"
+#include "xfs_iomap.h"
+#include <linux/fsverity.h>
+#include <linux/pagemap.h>
+
+static int
+xfs_fsverity_read(
+ struct inode *inode,
+ void *buf,
+ size_t count,
+ loff_t pos)
+{
+ struct folio *folio;
+ size_t n;
+
+ while (count) {
+ folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
+ NULL);
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);
+
+ n = memcpy_from_file_folio(buf, folio, pos, count);
+ folio_put(folio);
+
+ buf += n;
+ pos += n;
+ count -= n;
+ }
+ return 0;
+}
+
+static int
+xfs_fsverity_write(
+ struct xfs_inode *ip,
+ loff_t pos,
+ size_t length,
+ const void *buf)
+{
+ int ret;
+ struct iov_iter iter;
+ struct kvec kvec = {
+ .iov_base = (void *)buf,
+ .iov_len = length,
+ };
+ struct kiocb iocb = {
+ /*
+ * We don't have file here, but iomap_file_buffered_write uses
+ * it only to obtain inode, so, pass inode as private arg
+ * directly
+ */
+ .ki_filp = NULL,
+ .ki_ioprio = get_current_ioprio(),
+ .ki_pos = pos,
+ };
+
+ iov_iter_kvec(&iter, WRITE, &kvec, 1, length);
+ ret = iomap_file_buffered_write(&iocb, &iter,
+ &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
+ VFS_I(ip));
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+/*
+ * Retrieve the verity descriptor.
+ */
+static int
+xfs_fsverity_get_descriptor(
+ struct inode *inode,
+ void *buf,
+ size_t buf_size)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ __be32 d_desc_size;
+ u32 desc_size;
+ u64 desc_size_pos;
+ int error;
+ u64 desc_pos;
+ struct xfs_bmbt_irec rec;
+ int is_empty;
+ uint32_t blocksize = i_blocksize(VFS_I(ip));
+ xfs_fileoff_t last_block;
+
+ ASSERT(inode->i_flags & S_VERITY);
+ error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &rec, &is_empty);
+ if (error)
+ return error;
+
+ if (is_empty)
+ return -ENODATA;
+
+ last_block = (rec.br_startoff + rec.br_blockcount);
+ desc_size_pos = (last_block << ip->i_mount->m_sb.sb_blocklog) -
+ sizeof(__be32);
+ error = xfs_fsverity_read(inode, (char *)&d_desc_size,
+ sizeof(d_desc_size), desc_size_pos);
+ if (error)
+ return error;
+
+ desc_size = be32_to_cpu(d_desc_size);
+ if (desc_size > FS_VERITY_MAX_DESCRIPTOR_SIZE || desc_size > desc_size_pos)
+ return -ERANGE;
+
+ if (!buf_size)
+ return desc_size;
+
+ if (desc_size > buf_size)
+ return -ERANGE;
+
+ desc_pos = round_down(desc_size_pos - desc_size, blocksize);
+ error = xfs_fsverity_read(inode, buf, desc_size, desc_pos);
+ if (error)
+ return error;
+
+ return desc_size;
+}
+
+static int
+xfs_fsverity_write_descriptor(
+ struct xfs_inode *ip,
+ const void *desc,
+ u32 desc_size,
+ u64 merkle_tree_size)
+{
+ int error;
+ unsigned int blksize = ip->i_mount->m_attr_geo->blksize;
+ u64 desc_pos = round_up(
+ XFS_FSVERITY_REGION_START | merkle_tree_size, blksize);
+ u64 desc_end = desc_pos + desc_size;
+ __be32 desc_size_disk = cpu_to_be32(desc_size);
+ u64 desc_size_pos =
+ round_up(desc_end + sizeof(desc_size_disk), blksize) -
+ sizeof(desc_size_disk);
+
+ error = xfs_fsverity_write(ip, desc_size_pos,
+ sizeof(__be32),
+ (const void *)&desc_size_disk);
+ if (error)
+ return error;
+
+ error = xfs_fsverity_write(ip, desc_pos, desc_size, desc);
+
+ return error;
+}
+
+/*
+ * Try to remove all the fsverity metadata after a failed enablement.
+ */
+static int
+xfs_fsverity_delete_metadata(
+ struct xfs_inode *ip)
+{
+ struct xfs_trans *tp;
+ struct xfs_mount *mp = ip->i_mount;
+ int error;
+
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+ if (error)
+ return error;
+
+ xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+ error = xfs_truncate_page(ip, XFS_ISIZE(ip), NULL, NULL);
+ xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
+
+ /*
+ * We removing post EOF data, no need to update i_size
+ */
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
+ if (error)
+ goto err_cancel;
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto err_cancel;
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ return error;
+
+err_cancel:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_cancel(tp);
+ return error;
+}
+
+
+/*
+ * Prepare to enable fsverity by clearing old metadata.
+ */
+static int
+xfs_fsverity_begin_enable(
+ struct file *filp)
+{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+ int error;
+
+ xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
+
+ if (IS_DAX(inode))
+ return -EINVAL;
+
+ if (inode->i_size > XFS_FSVERITY_REGION_START)
+ return -EFBIG;
+
+ if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
+ return -EBUSY;
+
+ error = xfs_qm_dqattach(ip);
+ if (error)
+ return error;
+
+ /*
+ * Flush pagecache before building Merkle tree. Inode is locked and no
+ * further writes will happen to the file except fsverity metadata
+ */
+ error = filemap_write_and_wait(inode->i_mapping);
+ if (error)
+ return error;
+
+ return xfs_fsverity_delete_metadata(ip);
+}
+
+/*
+ * Complete (or fail) the process of enabling fsverity.
+ */
+static int
+xfs_fsverity_end_enable(
+ struct file *filp,
+ const void *desc,
+ size_t desc_size,
+ u64 merkle_tree_size)
+{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ int error = 0;
+
+ xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
+
+ /* fs-verity failed, just cleanup */
+ if (desc == NULL)
+ goto out;
+
+ error = xfs_fsverity_write_descriptor(ip, desc, desc_size,
+ merkle_tree_size);
+ if (error)
+ goto out;
+
+ /*
+ * Wait for Merkle tree get written to disk before setting on-disk inode
+ * flag and clearing XFS_VERITY_CONSTRUCTION
+ */
+ error = filemap_write_and_wait(inode->i_mapping);
+ if (error)
+ goto out;
+
+ /*
+ * Set fsverity inode flag
+ */
+ error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
+ 0, 0, false, &tp);
+ if (error)
+ goto out;
+
+ /*
+ * Ensure that we've persisted the verity information before we enable
+ * it on the inode and tell the caller we have sealed the inode.
+ */
+ ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ xfs_trans_set_sync(tp);
+
+ error = xfs_trans_commit(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ if (!error)
+ inode->i_flags |= S_VERITY;
+
+out:
+ if (error) {
+ int error2;
+
+ error2 = xfs_fsverity_delete_metadata(ip);
+ if (error2)
+ xfs_alert(ip->i_mount,
+"ino 0x%llx failed to clean up new fsverity metadata, err %d",
+ ip->i_ino, error2);
+ }
+
+ xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
+ return error;
+}
+
+/*
+ * Retrieve a merkle tree block.
+ */
+static struct page *
+xfs_fsverity_read_merkle(
+ struct inode *inode,
+ pgoff_t index,
+ unsigned long num_ra_pages)
+{
+ struct folio *folio;
+ pgoff_t offset =
+ index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
+
+ folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
+ if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
+ DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
+
+ if (!IS_ERR(folio))
+ folio_put(folio);
+ else if (num_ra_pages > 1)
+ page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
+ folio = read_mapping_folio(inode->i_mapping, offset, NULL);
+ if (IS_ERR(folio))
+ return ERR_CAST(folio);
+ }
+ return folio_file_page(folio, offset);
+}
+
+/*
+ * Write a merkle tree block.
+ */
+static int
+xfs_fsverity_write_merkle(
+ struct inode *inode,
+ const void *buf,
+ u64 pos,
+ unsigned int size)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ loff_t position = pos | XFS_FSVERITY_REGION_START;
+
+ if (position + size > inode->i_sb->s_maxbytes)
+ return -EFBIG;
+
+ return xfs_fsverity_write(ip, position, size, buf);
+}
+
+const ptrdiff_t info_offs = (int)offsetof(struct xfs_inode, i_verity_info) -
+ (int)offsetof(struct xfs_inode, i_vnode);
+
+const struct fsverity_operations xfs_fsverity_ops = {
+ .inode_info_offs = info_offs,
+ .begin_enable_verity = xfs_fsverity_begin_enable,
+ .end_enable_verity = xfs_fsverity_end_enable,
+ .get_verity_descriptor = xfs_fsverity_get_descriptor,
+ .read_merkle_tree_page = xfs_fsverity_read_merkle,
+ .write_merkle_tree_block = xfs_fsverity_write_merkle,
+};
diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
new file mode 100644
index 0000000000..8b0d7ef456
--- /dev/null
+++ b/fs/xfs/xfs_fsverity.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_FSVERITY_H__
+#define __XFS_FSVERITY_H__
+
+#ifdef CONFIG_FS_VERITY
+extern const struct fsverity_operations xfs_fsverity_ops;
+#endif /* CONFIG_FS_VERITY */
+
+#endif /* __XFS_FSVERITY_H__ */
diff --git a/fs/xfs/xfs_message.c b/fs/xfs/xfs_message.c
index 19aba2c3d5..17f0f0ca7b 100644
--- a/fs/xfs/xfs_message.c
+++ b/fs/xfs/xfs_message.c
@@ -161,6 +161,10 @@
.opstate = XFS_OPSTATE_WARNED_ZONED,
.name = "zoned RT device",
},
+ [XFS_EXPERIMENTAL_FSVERITY] = {
+ .opstate = XFS_OPSTATE_WARNED_ZONED,
+ .name = "fsverity",
+ },
};
ASSERT(feat >= 0 && feat < XFS_EXPERIMENTAL_MAX);
BUILD_BUG_ON(ARRAY_SIZE(features) != XFS_EXPERIMENTAL_MAX);
diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
index d68e72379f..1647d32ea4 100644
--- a/fs/xfs/xfs_message.h
+++ b/fs/xfs/xfs_message.h
@@ -96,6 +96,7 @@
XFS_EXPERIMENTAL_LBS,
XFS_EXPERIMENTAL_METADIR,
XFS_EXPERIMENTAL_ZONED,
+ XFS_EXPERIMENTAL_FSVERITY,
XFS_EXPERIMENTAL_MAX,
};
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 10c6fc8d20..42a16b15a6 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -30,6 +30,7 @@
#include "xfs_filestream.h"
#include "xfs_quota.h"
#include "xfs_sysfs.h"
+#include "xfs_fsverity.h"
#include "xfs_ondisk.h"
#include "xfs_rmap_item.h"
#include "xfs_refcount_item.h"
@@ -54,6 +55,7 @@
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
#include <linux/fsverity.h>
+#include <linux/iomap.h>
static const struct super_operations xfs_super_operations;
@@ -1706,6 +1708,9 @@
sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
#endif
sb->s_op = &xfs_super_operations;
+#ifdef CONFIG_FS_VERITY
+ sb->s_vop = &xfs_fsverity_ops;
+#endif
/*
* Delay mount work if the debug hook is set. This is debug
@@ -1959,10 +1964,19 @@
xfs_set_resuming_quotaon(mp);
mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
+ if (xfs_has_verity(mp))
+ xfs_warn_experimental(mp, XFS_EXPERIMENTAL_FSVERITY);
+
error = xfs_mountfs(mp);
if (error)
goto out_filestream_unmount;
+#ifdef CONFIG_FS_VERITY
+ error = iomap_fsverity_init_bioset();
+ if (error)
+ goto out_unmount;
+#endif
+
root = igrab(VFS_I(mp->m_rootip));
if (!root) {
error = -ENOENT;
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 16/22] xfs: add fs-verity support
2026-01-12 14:51 ` [PATCH v2 16/22] xfs: add fs-verity support Andrey Albershteyn
@ 2026-01-12 23:05 ` Darrick J. Wong
2026-01-13 18:32 ` Andrey Albershteyn
2026-01-16 14:52 ` Andrey Albershteyn
0 siblings, 2 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 23:05 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:51:43PM +0100, Andrey Albershteyn wrote:
> Add integration with fs-verity. XFS stores fs-verity descriptor and
> Merkle tree in the inode data fork at offset file offset (1 << 53).
>
> The Merkle tree reading/writing is done through iomap interface. The
> data itself are read to the inode's page cache. When XFS reads from this
> region iomap doesn't call into fsverity to verify it against Merkle
> tree. For data, verification is done on BIO completion in a workqueue.
>
> When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> flag is set meaning that the Merkle tree is being build. The
> initialization ends with storing of verity descriptor and setting
> inode on-disk flag (XFS_DIFLAG2_VERITY).
Might want to mention that XFS_DIFLAG2_VERITY sets S_VERITY, and that
XFS_IVERITY_CONSTRUCTION gets dropped after construction ends.
> The descriptor is stored in a new block after the last Merkle tree
> block. The size of the descriptor is stored at the end of the last
> descriptor block (descriptor can be multiple blocks).
Huh, I would have thought the descriptor would go at 1<<53 and the
merkle tree goes immediately afterwards but eh, whatever. :)
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/Makefile | 1 +
> fs/xfs/xfs_bmap_util.c | 7 +
> fs/xfs/xfs_fsverity.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_fsverity.h | 12 +
> fs/xfs/xfs_message.c | 4 +
> fs/xfs/xfs_message.h | 1 +
> fs/xfs/xfs_super.c | 14 +
> 7 files changed, 415 insertions(+), 0 deletions(-)
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 5bf501cf82..ad66439db7 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -147,6 +147,7 @@
> xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
> xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
> xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
> +xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
>
> # notify failure
> ifeq ($(CONFIG_MEMORY_FAILURE),y)
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2208a720ec..79a255a3ac 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -31,6 +31,7 @@
> #include "xfs_rtbitmap.h"
> #include "xfs_rtgroup.h"
> #include "xfs_zone_alloc.h"
> +#include <linux/fsverity.h>
>
> /* Kernel only BMAP related definitions and functions */
>
> @@ -554,6 +555,12 @@
> return false;
>
> /*
> + * Nothing to clean on fsverity inodes as they are read-only
> + */
> + if (IS_VERITY(VFS_I(ip)))
> + return false;
> +
> + /*
> * Check if there is an post-EOF extent to free. If there are any
> * delalloc blocks attached to the inode (data fork delalloc
> * reservations or CoW extents of any kind), we need to free them so
> diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> new file mode 100644
> index 0000000000..691dc60778
> --- /dev/null
> +++ b/fs/xfs/xfs_fsverity.c
> @@ -0,0 +1,376 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2025 Red Hat, Inc.
> + */
> +#include "xfs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_log_format.h"
> +#include "xfs_bmap_util.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_quota.h"
> +#include "xfs_fsverity.h"
> +#include "xfs_iomap.h"
> +#include <linux/fsverity.h>
> +#include <linux/pagemap.h>
> +
> +static int
> +xfs_fsverity_read(
> + struct inode *inode,
> + void *buf,
> + size_t count,
> + loff_t pos)
> +{
> + struct folio *folio;
> + size_t n;
> +
> + while (count) {
> + folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
> + NULL);
> + if (IS_ERR(folio))
> + return PTR_ERR(folio);
> +
> + n = memcpy_from_file_folio(buf, folio, pos, count);
> + folio_put(folio);
> +
> + buf += n;
> + pos += n;
> + count -= n;
> + }
> + return 0;
> +}
> +
> +static int
> +xfs_fsverity_write(
> + struct xfs_inode *ip,
> + loff_t pos,
> + size_t length,
> + const void *buf)
> +{
> + int ret;
> + struct iov_iter iter;
> + struct kvec kvec = {
> + .iov_base = (void *)buf,
> + .iov_len = length,
> + };
> + struct kiocb iocb = {
> + /*
> + * We don't have file here, but iomap_file_buffered_write uses
> + * it only to obtain inode, so, pass inode as private arg
> + * directly
> + */
Oh, it occurs to me that fsverity doesn't take the struct file and pass
it through to the per-fs implementation. And that's why you had to
resort to the trick of passing the struct inode in as "private" earlier.
I think that needs to get fixed, unfortunately it's a treewide change.
What if you wrote a iomap_write_iter wrapper that skips all the iocb
junk and just takes the inode/pos/len directly?
> + .ki_filp = NULL,
> + .ki_ioprio = get_current_ioprio(),
> + .ki_pos = pos,
> + };
> +
> + iov_iter_kvec(&iter, WRITE, &kvec, 1, length);
> + ret = iomap_file_buffered_write(&iocb, &iter,
> + &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
> + VFS_I(ip));
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +/*
> + * Retrieve the verity descriptor.
> + */
> +static int
> +xfs_fsverity_get_descriptor(
> + struct inode *inode,
> + void *buf,
> + size_t buf_size)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + __be32 d_desc_size;
> + u32 desc_size;
> + u64 desc_size_pos;
> + int error;
> + u64 desc_pos;
> + struct xfs_bmbt_irec rec;
> + int is_empty;
> + uint32_t blocksize = i_blocksize(VFS_I(ip));
> + xfs_fileoff_t last_block;
> +
> + ASSERT(inode->i_flags & S_VERITY);
> + error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &rec, &is_empty);
> + if (error)
> + return error;
> +
> + if (is_empty)
> + return -ENODATA;
> +
> + last_block = (rec.br_startoff + rec.br_blockcount);
> + desc_size_pos = (last_block << ip->i_mount->m_sb.sb_blocklog) -
> + sizeof(__be32);
> + error = xfs_fsverity_read(inode, (char *)&d_desc_size,
> + sizeof(d_desc_size), desc_size_pos);
> + if (error)
> + return error;
> +
> + desc_size = be32_to_cpu(d_desc_size);
> + if (desc_size > FS_VERITY_MAX_DESCRIPTOR_SIZE || desc_size > desc_size_pos)
> + return -ERANGE;
> +
> + if (!buf_size)
> + return desc_size;
> +
> + if (desc_size > buf_size)
> + return -ERANGE;
> +
> + desc_pos = round_down(desc_size_pos - desc_size, blocksize);
> + error = xfs_fsverity_read(inode, buf, desc_size, desc_pos);
> + if (error)
> + return error;
> +
> + return desc_size;
> +}
You might want to wrap the integrity checks through XFS_IS_CORRUPT so
that we get some logging on corrupt fsverity data. Also, if descriptor
corruption doesn't prevent iget from completing, then we ought to define
a new health state for the xfs_inode so that it can report those kinds
of failures via bulkstat.
> +
> +static int
> +xfs_fsverity_write_descriptor(
> + struct xfs_inode *ip,
> + const void *desc,
> + u32 desc_size,
> + u64 merkle_tree_size)
> +{
> + int error;
> + unsigned int blksize = ip->i_mount->m_attr_geo->blksize;
> + u64 desc_pos = round_up(
> + XFS_FSVERITY_REGION_START | merkle_tree_size, blksize);
> + u64 desc_end = desc_pos + desc_size;
> + __be32 desc_size_disk = cpu_to_be32(desc_size);
> + u64 desc_size_pos =
> + round_up(desc_end + sizeof(desc_size_disk), blksize) -
> + sizeof(desc_size_disk);
> +
> + error = xfs_fsverity_write(ip, desc_size_pos,
> + sizeof(__be32),
> + (const void *)&desc_size_disk);
> + if (error)
> + return error;
> +
> + error = xfs_fsverity_write(ip, desc_pos, desc_size, desc);
> +
> + return error;
> +}
> +
> +/*
> + * Try to remove all the fsverity metadata after a failed enablement.
> + */
> +static int
> +xfs_fsverity_delete_metadata(
> + struct xfs_inode *ip)
> +{
> + struct xfs_trans *tp;
> + struct xfs_mount *mp = ip->i_mount;
> + int error;
> +
> + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> + if (error)
> + return error;
> +
> + xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
The MMAPLOCK should be taken before the transaction allocation like
everything else in xfs.
> + error = xfs_truncate_page(ip, XFS_ISIZE(ip), NULL, NULL);
> + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> +
> + xfs_ilock(ip, XFS_ILOCK_EXCL);
> + xfs_trans_ijoin(tp, ip, 0);
> +
> + /*
> + * We removing post EOF data, no need to update i_size
> + */
> + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> + if (error)
> + goto err_cancel;
> +
> + error = xfs_trans_commit(tp);
> + if (error)
> + goto err_cancel;
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> + return error;
> +
> +err_cancel:
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> + xfs_trans_cancel(tp);
> + return error;
> +}
> +
> +
> +/*
> + * Prepare to enable fsverity by clearing old metadata.
> + */
> +static int
> +xfs_fsverity_begin_enable(
> + struct file *filp)
> +{
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + int error;
> +
> + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> +
> + if (IS_DAX(inode))
> + return -EINVAL;
> +
> + if (inode->i_size > XFS_FSVERITY_REGION_START)
> + return -EFBIG;
> +
> + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> + return -EBUSY;
> +
> + error = xfs_qm_dqattach(ip);
> + if (error)
> + return error;
> +
> + /*
> + * Flush pagecache before building Merkle tree. Inode is locked and no
> + * further writes will happen to the file except fsverity metadata
Don't we need to take the MMAPLOCK to prevent concurrent write faults?
> + */
> + error = filemap_write_and_wait(inode->i_mapping);
> + if (error)
> + return error;
> +
> + return xfs_fsverity_delete_metadata(ip);
> +}
> +
> +/*
> + * Complete (or fail) the process of enabling fsverity.
> + */
> +static int
> +xfs_fsverity_end_enable(
> + struct file *filp,
> + const void *desc,
> + size_t desc_size,
> + u64 merkle_tree_size)
> +{
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_trans *tp;
> + int error = 0;
> +
> + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> +
> + /* fs-verity failed, just cleanup */
> + if (desc == NULL)
> + goto out;
> +
> + error = xfs_fsverity_write_descriptor(ip, desc, desc_size,
> + merkle_tree_size);
> + if (error)
> + goto out;
> +
> + /*
> + * Wait for Merkle tree get written to disk before setting on-disk inode
> + * flag and clearing XFS_VERITY_CONSTRUCTION
> + */
> + error = filemap_write_and_wait(inode->i_mapping);
> + if (error)
> + goto out;
> +
> + /*
> + * Set fsverity inode flag
> + */
> + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> + 0, 0, false, &tp);
> + if (error)
> + goto out;
> +
> + /*
> + * Ensure that we've persisted the verity information before we enable
> + * it on the inode and tell the caller we have sealed the inode.
> + */
> + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> +
> + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> + xfs_trans_set_sync(tp);
> +
> + error = xfs_trans_commit(tp);
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> + if (!error)
> + inode->i_flags |= S_VERITY;
> +
> +out:
> + if (error) {
> + int error2;
> +
> + error2 = xfs_fsverity_delete_metadata(ip);
> + if (error2)
> + xfs_alert(ip->i_mount,
> +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> + ip->i_ino, error2);
> + }
> +
> + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> + return error;
> +}
> +
> +/*
> + * Retrieve a merkle tree block.
> + */
> +static struct page *
> +xfs_fsverity_read_merkle(
> + struct inode *inode,
> + pgoff_t index,
> + unsigned long num_ra_pages)
> +{
> + struct folio *folio;
> + pgoff_t offset =
> + index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
> +
> + folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
> + if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
> +
> + if (!IS_ERR(folio))
> + folio_put(folio);
> + else if (num_ra_pages > 1)
> + page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
> + folio = read_mapping_folio(inode->i_mapping, offset, NULL);
> + if (IS_ERR(folio))
> + return ERR_CAST(folio);
> + }
> + return folio_file_page(folio, offset);
Shouldn't this be some _BEYOND_EOF variant of generic_file_read_iter?
> +}
> +
> +/*
> + * Write a merkle tree block.
> + */
> +static int
> +xfs_fsverity_write_merkle(
> + struct inode *inode,
> + const void *buf,
> + u64 pos,
> + unsigned int size)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + loff_t position = pos | XFS_FSVERITY_REGION_START;
> +
> + if (position + size > inode->i_sb->s_maxbytes)
> + return -EFBIG;
> +
> + return xfs_fsverity_write(ip, position, size, buf);
> +}
> +
> +const ptrdiff_t info_offs = (int)offsetof(struct xfs_inode, i_verity_info) -
> + (int)offsetof(struct xfs_inode, i_vnode);
I ... wow.
Not blaming you for writing this, just surprised that the common code
makes you do that.
> +const struct fsverity_operations xfs_fsverity_ops = {
> + .inode_info_offs = info_offs,
> + .begin_enable_verity = xfs_fsverity_begin_enable,
> + .end_enable_verity = xfs_fsverity_end_enable,
> + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> +};
> diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> new file mode 100644
> index 0000000000..8b0d7ef456
> --- /dev/null
> +++ b/fs/xfs/xfs_fsverity.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2022 Red Hat, Inc.
> + */
> +#ifndef __XFS_FSVERITY_H__
> +#define __XFS_FSVERITY_H__
> +
> +#ifdef CONFIG_FS_VERITY
> +extern const struct fsverity_operations xfs_fsverity_ops;
> +#endif /* CONFIG_FS_VERITY */
> +
> +#endif /* __XFS_FSVERITY_H__ */
> diff --git a/fs/xfs/xfs_message.c b/fs/xfs/xfs_message.c
> index 19aba2c3d5..17f0f0ca7b 100644
> --- a/fs/xfs/xfs_message.c
> +++ b/fs/xfs/xfs_message.c
> @@ -161,6 +161,10 @@
> .opstate = XFS_OPSTATE_WARNED_ZONED,
> .name = "zoned RT device",
> },
> + [XFS_EXPERIMENTAL_FSVERITY] = {
> + .opstate = XFS_OPSTATE_WARNED_ZONED,
> + .name = "fsverity",
> + },
> };
> ASSERT(feat >= 0 && feat < XFS_EXPERIMENTAL_MAX);
> BUILD_BUG_ON(ARRAY_SIZE(features) != XFS_EXPERIMENTAL_MAX);
> diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
> index d68e72379f..1647d32ea4 100644
> --- a/fs/xfs/xfs_message.h
> +++ b/fs/xfs/xfs_message.h
> @@ -96,6 +96,7 @@
> XFS_EXPERIMENTAL_LBS,
> XFS_EXPERIMENTAL_METADIR,
> XFS_EXPERIMENTAL_ZONED,
> + XFS_EXPERIMENTAL_FSVERITY,
>
> XFS_EXPERIMENTAL_MAX,
> };
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 10c6fc8d20..42a16b15a6 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -30,6 +30,7 @@
> #include "xfs_filestream.h"
> #include "xfs_quota.h"
> #include "xfs_sysfs.h"
> +#include "xfs_fsverity.h"
> #include "xfs_ondisk.h"
> #include "xfs_rmap_item.h"
> #include "xfs_refcount_item.h"
> @@ -54,6 +55,7 @@
> #include <linux/fs_context.h>
> #include <linux/fs_parser.h>
> #include <linux/fsverity.h>
> +#include <linux/iomap.h>
>
> static const struct super_operations xfs_super_operations;
>
> @@ -1706,6 +1708,9 @@
> sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> #endif
> sb->s_op = &xfs_super_operations;
> +#ifdef CONFIG_FS_VERITY
> + sb->s_vop = &xfs_fsverity_ops;
> +#endif
>
> /*
> * Delay mount work if the debug hook is set. This is debug
> @@ -1959,10 +1964,19 @@
> xfs_set_resuming_quotaon(mp);
> mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
>
> + if (xfs_has_verity(mp))
> + xfs_warn_experimental(mp, XFS_EXPERIMENTAL_FSVERITY);
> +
> error = xfs_mountfs(mp);
> if (error)
> goto out_filestream_unmount;
>
> +#ifdef CONFIG_FS_VERITY
> + error = iomap_fsverity_init_bioset();
if (xfs_has_verity()) ?
--D
> + if (error)
> + goto out_unmount;
> +#endif
> +
> root = igrab(VFS_I(mp->m_rootip));
> if (!root) {
> error = -ENOENT;
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 16/22] xfs: add fs-verity support
2026-01-12 23:05 ` Darrick J. Wong
@ 2026-01-13 18:32 ` Andrey Albershteyn
2026-01-14 16:40 ` Darrick J. Wong
2026-01-16 14:52 ` Andrey Albershteyn
1 sibling, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 18:32 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On 2026-01-12 15:05:48, Darrick J. Wong wrote:
> On Mon, Jan 12, 2026 at 03:51:43PM +0100, Andrey Albershteyn wrote:
> > Add integration with fs-verity. XFS stores fs-verity descriptor and
> > Merkle tree in the inode data fork at offset file offset (1 << 53).
> >
> > The Merkle tree reading/writing is done through iomap interface. The
> > data itself are read to the inode's page cache. When XFS reads from this
> > region iomap doesn't call into fsverity to verify it against Merkle
> > tree. For data, verification is done on BIO completion in a workqueue.
> >
> > When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> > flag is set meaning that the Merkle tree is being build. The
> > initialization ends with storing of verity descriptor and setting
> > inode on-disk flag (XFS_DIFLAG2_VERITY).
>
> Might want to mention that XFS_DIFLAG2_VERITY sets S_VERITY, and that
> XFS_IVERITY_CONSTRUCTION gets dropped after construction ends.
added
>
> > The descriptor is stored in a new block after the last Merkle tree
> > block. The size of the descriptor is stored at the end of the last
> > descriptor block (descriptor can be multiple blocks).
>
> Huh, I would have thought the descriptor would go at 1<<53 and the
> merkle tree goes immediately afterwards but eh, whatever. :)
Yeah, I did it initially as it's easier to find both, descriptor and
tree. But it will cause problems if descriptor will ever grow in size
(current max is 16k).
>
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/xfs/Makefile | 1 +
> > fs/xfs/xfs_bmap_util.c | 7 +
> > fs/xfs/xfs_fsverity.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/xfs_fsverity.h | 12 +
> > fs/xfs/xfs_message.c | 4 +
> > fs/xfs/xfs_message.h | 1 +
> > fs/xfs/xfs_super.c | 14 +
> > 7 files changed, 415 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 5bf501cf82..ad66439db7 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -147,6 +147,7 @@
> > xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
> > xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
> > xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
> > +xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
> >
> > # notify failure
> > ifeq ($(CONFIG_MEMORY_FAILURE),y)
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 2208a720ec..79a255a3ac 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -31,6 +31,7 @@
> > #include "xfs_rtbitmap.h"
> > #include "xfs_rtgroup.h"
> > #include "xfs_zone_alloc.h"
> > +#include <linux/fsverity.h>
> >
> > /* Kernel only BMAP related definitions and functions */
> >
> > @@ -554,6 +555,12 @@
> > return false;
> >
> > /*
> > + * Nothing to clean on fsverity inodes as they are read-only
> > + */
> > + if (IS_VERITY(VFS_I(ip)))
> > + return false;
> > +
> > + /*
> > * Check if there is an post-EOF extent to free. If there are any
> > * delalloc blocks attached to the inode (data fork delalloc
> > * reservations or CoW extents of any kind), we need to free them so
> > diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> > new file mode 100644
> > index 0000000000..691dc60778
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fsverity.c
> > @@ -0,0 +1,376 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2025 Red Hat, Inc.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_bmap_util.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_quota.h"
> > +#include "xfs_fsverity.h"
> > +#include "xfs_iomap.h"
> > +#include <linux/fsverity.h>
> > +#include <linux/pagemap.h>
> > +
> > +static int
> > +xfs_fsverity_read(
> > + struct inode *inode,
> > + void *buf,
> > + size_t count,
> > + loff_t pos)
> > +{
> > + struct folio *folio;
> > + size_t n;
> > +
> > + while (count) {
> > + folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
> > + NULL);
> > + if (IS_ERR(folio))
> > + return PTR_ERR(folio);
> > +
> > + n = memcpy_from_file_folio(buf, folio, pos, count);
> > + folio_put(folio);
> > +
> > + buf += n;
> > + pos += n;
> > + count -= n;
> > + }
> > + return 0;
> > +}
> > +
> > +static int
> > +xfs_fsverity_write(
> > + struct xfs_inode *ip,
> > + loff_t pos,
> > + size_t length,
> > + const void *buf)
> > +{
> > + int ret;
> > + struct iov_iter iter;
> > + struct kvec kvec = {
> > + .iov_base = (void *)buf,
> > + .iov_len = length,
> > + };
> > + struct kiocb iocb = {
> > + /*
> > + * We don't have file here, but iomap_file_buffered_write uses
> > + * it only to obtain inode, so, pass inode as private arg
> > + * directly
> > + */
>
> Oh, it occurs to me that fsverity doesn't take the struct file and pass
> it through to the per-fs implementation. And that's why you had to
> resort to the trick of passing the struct inode in as "private" earlier.
>
> I think that needs to get fixed, unfortunately it's a treewide change.
> What if you wrote a iomap_write_iter wrapper that skips all the iocb
> junk and just takes the inode/pos/len directly?
I will add a wrapper for now
>
> > + .ki_filp = NULL,
> > + .ki_ioprio = get_current_ioprio(),
> > + .ki_pos = pos,
> > + };
> > +
> > + iov_iter_kvec(&iter, WRITE, &kvec, 1, length);
> > + ret = iomap_file_buffered_write(&iocb, &iter,
> > + &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
> > + VFS_I(ip));
> > + if (ret < 0)
> > + return ret;
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Retrieve the verity descriptor.
> > + */
> > +static int
> > +xfs_fsverity_get_descriptor(
> > + struct inode *inode,
> > + void *buf,
> > + size_t buf_size)
> > +{
> > + struct xfs_inode *ip = XFS_I(inode);
> > + __be32 d_desc_size;
> > + u32 desc_size;
> > + u64 desc_size_pos;
> > + int error;
> > + u64 desc_pos;
> > + struct xfs_bmbt_irec rec;
> > + int is_empty;
> > + uint32_t blocksize = i_blocksize(VFS_I(ip));
> > + xfs_fileoff_t last_block;
> > +
> > + ASSERT(inode->i_flags & S_VERITY);
> > + error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &rec, &is_empty);
> > + if (error)
> > + return error;
> > +
> > + if (is_empty)
> > + return -ENODATA;
> > +
> > + last_block = (rec.br_startoff + rec.br_blockcount);
> > + desc_size_pos = (last_block << ip->i_mount->m_sb.sb_blocklog) -
> > + sizeof(__be32);
> > + error = xfs_fsverity_read(inode, (char *)&d_desc_size,
> > + sizeof(d_desc_size), desc_size_pos);
> > + if (error)
> > + return error;
> > +
> > + desc_size = be32_to_cpu(d_desc_size);
> > + if (desc_size > FS_VERITY_MAX_DESCRIPTOR_SIZE || desc_size > desc_size_pos)
> > + return -ERANGE;
> > +
> > + if (!buf_size)
> > + return desc_size;
> > +
> > + if (desc_size > buf_size)
> > + return -ERANGE;
> > +
> > + desc_pos = round_down(desc_size_pos - desc_size, blocksize);
> > + error = xfs_fsverity_read(inode, buf, desc_size, desc_pos);
> > + if (error)
> > + return error;
> > +
> > + return desc_size;
> > +}
>
> You might want to wrap the integrity checks through XFS_IS_CORRUPT so
> that we get some logging on corrupt fsverity data. Also, if descriptor
> corruption doesn't prevent iget from completing, then we ought to define
> a new health state for the xfs_inode so that it can report those kinds
> of failures via bulkstat.
sure, thanks! I will add XFS_IS_CORRUPT
>
> > +
> > +static int
> > +xfs_fsverity_write_descriptor(
> > + struct xfs_inode *ip,
> > + const void *desc,
> > + u32 desc_size,
> > + u64 merkle_tree_size)
> > +{
> > + int error;
> > + unsigned int blksize = ip->i_mount->m_attr_geo->blksize;
> > + u64 desc_pos = round_up(
> > + XFS_FSVERITY_REGION_START | merkle_tree_size, blksize);
> > + u64 desc_end = desc_pos + desc_size;
> > + __be32 desc_size_disk = cpu_to_be32(desc_size);
> > + u64 desc_size_pos =
> > + round_up(desc_end + sizeof(desc_size_disk), blksize) -
> > + sizeof(desc_size_disk);
> > +
> > + error = xfs_fsverity_write(ip, desc_size_pos,
> > + sizeof(__be32),
> > + (const void *)&desc_size_disk);
> > + if (error)
> > + return error;
> > +
> > + error = xfs_fsverity_write(ip, desc_pos, desc_size, desc);
> > +
> > + return error;
> > +}
> > +
> > +/*
> > + * Try to remove all the fsverity metadata after a failed enablement.
> > + */
> > +static int
> > +xfs_fsverity_delete_metadata(
> > + struct xfs_inode *ip)
> > +{
> > + struct xfs_trans *tp;
> > + struct xfs_mount *mp = ip->i_mount;
> > + int error;
> > +
> > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> > + if (error)
> > + return error;
> > +
> > + xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
>
> The MMAPLOCK should be taken before the transaction allocation like
> everything else in xfs.
hmm, but is it needed for transaction? I meant to only wrap
xfs_truncate_page() with mmaplock
>
> > + error = xfs_truncate_page(ip, XFS_ISIZE(ip), NULL, NULL);
> > + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > +
> > + xfs_ilock(ip, XFS_ILOCK_EXCL);
> > + xfs_trans_ijoin(tp, ip, 0);
> > +
> > + /*
> > + * We removing post EOF data, no need to update i_size
> > + */
> > + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> > + if (error)
> > + goto err_cancel;
> > +
> > + error = xfs_trans_commit(tp);
> > + if (error)
> > + goto err_cancel;
> > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +
> > + return error;
> > +
> > +err_cancel:
> > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > + xfs_trans_cancel(tp);
> > + return error;
> > +}
> > +
> > +
> > +/*
> > + * Prepare to enable fsverity by clearing old metadata.
> > + */
> > +static int
> > +xfs_fsverity_begin_enable(
> > + struct file *filp)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + struct xfs_inode *ip = XFS_I(inode);
> > + int error;
> > +
> > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > +
> > + if (IS_DAX(inode))
> > + return -EINVAL;
> > +
> > + if (inode->i_size > XFS_FSVERITY_REGION_START)
> > + return -EFBIG;
> > +
> > + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> > + return -EBUSY;
> > +
> > + error = xfs_qm_dqattach(ip);
> > + if (error)
> > + return error;
> > +
> > + /*
> > + * Flush pagecache before building Merkle tree. Inode is locked and no
> > + * further writes will happen to the file except fsverity metadata
>
> Don't we need to take the MMAPLOCK to prevent concurrent write faults?
will add it
>
> > + */
> > + error = filemap_write_and_wait(inode->i_mapping);
> > + if (error)
> > + return error;
> > +
> > + return xfs_fsverity_delete_metadata(ip);
> > +}
> > +
> > +/*
> > + * Complete (or fail) the process of enabling fsverity.
> > + */
> > +static int
> > +xfs_fsverity_end_enable(
> > + struct file *filp,
> > + const void *desc,
> > + size_t desc_size,
> > + u64 merkle_tree_size)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + struct xfs_inode *ip = XFS_I(inode);
> > + struct xfs_mount *mp = ip->i_mount;
> > + struct xfs_trans *tp;
> > + int error = 0;
> > +
> > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > +
> > + /* fs-verity failed, just cleanup */
> > + if (desc == NULL)
> > + goto out;
> > +
> > + error = xfs_fsverity_write_descriptor(ip, desc, desc_size,
> > + merkle_tree_size);
> > + if (error)
> > + goto out;
> > +
> > + /*
> > + * Wait for Merkle tree get written to disk before setting on-disk inode
> > + * flag and clearing XFS_VERITY_CONSTRUCTION
> > + */
> > + error = filemap_write_and_wait(inode->i_mapping);
> > + if (error)
> > + goto out;
> > +
> > + /*
> > + * Set fsverity inode flag
> > + */
> > + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> > + 0, 0, false, &tp);
> > + if (error)
> > + goto out;
> > +
> > + /*
> > + * Ensure that we've persisted the verity information before we enable
> > + * it on the inode and tell the caller we have sealed the inode.
> > + */
> > + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> > +
> > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > + xfs_trans_set_sync(tp);
> > +
> > + error = xfs_trans_commit(tp);
> > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +
> > + if (!error)
> > + inode->i_flags |= S_VERITY;
> > +
> > +out:
> > + if (error) {
> > + int error2;
> > +
> > + error2 = xfs_fsverity_delete_metadata(ip);
> > + if (error2)
> > + xfs_alert(ip->i_mount,
> > +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> > + ip->i_ino, error2);
> > + }
> > +
> > + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> > + return error;
> > +}
> > +
> > +/*
> > + * Retrieve a merkle tree block.
> > + */
> > +static struct page *
> > +xfs_fsverity_read_merkle(
> > + struct inode *inode,
> > + pgoff_t index,
> > + unsigned long num_ra_pages)
> > +{
> > + struct folio *folio;
> > + pgoff_t offset =
> > + index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
> > +
> > + folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
> > + if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> > + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
> > +
> > + if (!IS_ERR(folio))
> > + folio_put(folio);
> > + else if (num_ra_pages > 1)
> > + page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
> > + folio = read_mapping_folio(inode->i_mapping, offset, NULL);
> > + if (IS_ERR(folio))
> > + return ERR_CAST(folio);
> > + }
> > + return folio_file_page(folio, offset);
>
> Shouldn't this be some _BEYOND_EOF variant of generic_file_read_iter?
_BEYOND_EOF is set by xfs_iomap_read_begin() while mapping the
blocks, and also this uses _unbounded version of readhead to skip
EOF checks.
>
> > +}
> > +
> > +/*
> > + * Write a merkle tree block.
> > + */
> > +static int
> > +xfs_fsverity_write_merkle(
> > + struct inode *inode,
> > + const void *buf,
> > + u64 pos,
> > + unsigned int size)
> > +{
> > + struct xfs_inode *ip = XFS_I(inode);
> > + loff_t position = pos | XFS_FSVERITY_REGION_START;
> > +
> > + if (position + size > inode->i_sb->s_maxbytes)
> > + return -EFBIG;
> > +
> > + return xfs_fsverity_write(ip, position, size, buf);
> > +}
> > +
> > +const ptrdiff_t info_offs = (int)offsetof(struct xfs_inode, i_verity_info) -
> > + (int)offsetof(struct xfs_inode, i_vnode);
>
> I ... wow.
>
> Not blaming you for writing this, just surprised that the common code
> makes you do that.
>
> > +const struct fsverity_operations xfs_fsverity_ops = {
> > + .inode_info_offs = info_offs,
> > + .begin_enable_verity = xfs_fsverity_begin_enable,
> > + .end_enable_verity = xfs_fsverity_end_enable,
> > + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> > + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> > + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> > +};
> > diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> > new file mode 100644
> > index 0000000000..8b0d7ef456
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fsverity.h
> > @@ -0,0 +1,12 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2022 Red Hat, Inc.
> > + */
> > +#ifndef __XFS_FSVERITY_H__
> > +#define __XFS_FSVERITY_H__
> > +
> > +#ifdef CONFIG_FS_VERITY
> > +extern const struct fsverity_operations xfs_fsverity_ops;
> > +#endif /* CONFIG_FS_VERITY */
> > +
> > +#endif /* __XFS_FSVERITY_H__ */
> > diff --git a/fs/xfs/xfs_message.c b/fs/xfs/xfs_message.c
> > index 19aba2c3d5..17f0f0ca7b 100644
> > --- a/fs/xfs/xfs_message.c
> > +++ b/fs/xfs/xfs_message.c
> > @@ -161,6 +161,10 @@
> > .opstate = XFS_OPSTATE_WARNED_ZONED,
> > .name = "zoned RT device",
> > },
> > + [XFS_EXPERIMENTAL_FSVERITY] = {
> > + .opstate = XFS_OPSTATE_WARNED_ZONED,
> > + .name = "fsverity",
> > + },
> > };
> > ASSERT(feat >= 0 && feat < XFS_EXPERIMENTAL_MAX);
> > BUILD_BUG_ON(ARRAY_SIZE(features) != XFS_EXPERIMENTAL_MAX);
> > diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
> > index d68e72379f..1647d32ea4 100644
> > --- a/fs/xfs/xfs_message.h
> > +++ b/fs/xfs/xfs_message.h
> > @@ -96,6 +96,7 @@
> > XFS_EXPERIMENTAL_LBS,
> > XFS_EXPERIMENTAL_METADIR,
> > XFS_EXPERIMENTAL_ZONED,
> > + XFS_EXPERIMENTAL_FSVERITY,
> >
> > XFS_EXPERIMENTAL_MAX,
> > };
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 10c6fc8d20..42a16b15a6 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -30,6 +30,7 @@
> > #include "xfs_filestream.h"
> > #include "xfs_quota.h"
> > #include "xfs_sysfs.h"
> > +#include "xfs_fsverity.h"
> > #include "xfs_ondisk.h"
> > #include "xfs_rmap_item.h"
> > #include "xfs_refcount_item.h"
> > @@ -54,6 +55,7 @@
> > #include <linux/fs_context.h>
> > #include <linux/fs_parser.h>
> > #include <linux/fsverity.h>
> > +#include <linux/iomap.h>
> >
> > static const struct super_operations xfs_super_operations;
> >
> > @@ -1706,6 +1708,9 @@
> > sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> > #endif
> > sb->s_op = &xfs_super_operations;
> > +#ifdef CONFIG_FS_VERITY
> > + sb->s_vop = &xfs_fsverity_ops;
> > +#endif
> >
> > /*
> > * Delay mount work if the debug hook is set. This is debug
> > @@ -1959,10 +1964,19 @@
> > xfs_set_resuming_quotaon(mp);
> > mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
> >
> > + if (xfs_has_verity(mp))
> > + xfs_warn_experimental(mp, XFS_EXPERIMENTAL_FSVERITY);
> > +
> > error = xfs_mountfs(mp);
> > if (error)
> > goto out_filestream_unmount;
> >
> > +#ifdef CONFIG_FS_VERITY
> > + error = iomap_fsverity_init_bioset();
>
> if (xfs_has_verity()) ?
sure, will change it
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 16/22] xfs: add fs-verity support
2026-01-13 18:32 ` Andrey Albershteyn
@ 2026-01-14 16:40 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-14 16:40 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Tue, Jan 13, 2026 at 07:32:06PM +0100, Andrey Albershteyn wrote:
> On 2026-01-12 15:05:48, Darrick J. Wong wrote:
> > On Mon, Jan 12, 2026 at 03:51:43PM +0100, Andrey Albershteyn wrote:
> > > Add integration with fs-verity. XFS stores fs-verity descriptor and
> > > Merkle tree in the inode data fork at offset file offset (1 << 53).
> > >
> > > The Merkle tree reading/writing is done through iomap interface. The
> > > data itself are read to the inode's page cache. When XFS reads from this
> > > region iomap doesn't call into fsverity to verify it against Merkle
> > > tree. For data, verification is done on BIO completion in a workqueue.
> > >
> > > When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> > > flag is set meaning that the Merkle tree is being build. The
> > > initialization ends with storing of verity descriptor and setting
> > > inode on-disk flag (XFS_DIFLAG2_VERITY).
<snip some of this stuff>
> > > +/*
> > > + * Try to remove all the fsverity metadata after a failed enablement.
> > > + */
> > > +static int
> > > +xfs_fsverity_delete_metadata(
> > > + struct xfs_inode *ip)
> > > +{
> > > + struct xfs_trans *tp;
> > > + struct xfs_mount *mp = ip->i_mount;
> > > + int error;
> > > +
> > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> > > + if (error)
> > > + return error;
> > > +
> > > + xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> >
> > The MMAPLOCK should be taken before the transaction allocation like
> > everything else in xfs.
>
> hmm, but is it needed for transaction? I meant to only wrap
> xfs_truncate_page() with mmaplock
The MMAPLOCK isn't needed to run the transaction, but the normal
resource acquisition order is:
i_rwsem (IOLOCK) -> mmap_invalidatelock (MMAPLOCK) -> log grant space
(aka xfs_trans_reserve) -> ILOCK.
By taking the MMAPLOCK after log space has been granted, you're now
risking an ABBA deadlock because another thread could hold the same
MMAPLOCK and is waiting for a log grant while the current thread took
the last of the log grant space and is now waiting for the MMAPLOCK.
Why are we calling xfs_truncate_page on the EOF page?
Should we hold the MMAPLOCK during the entire merkle tree construction
process so that page_mkwrite is blocked?
Also, should page_mkwrite return an error if either S_VERITY or
XFS_VERITY_CONSTRUCTION are set on the inode?
--D
> > > + error = xfs_truncate_page(ip, XFS_ISIZE(ip), NULL, NULL);
> > > + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > +
> > > + xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > + xfs_trans_ijoin(tp, ip, 0);
> > > +
> > > + /*
> > > + * We removing post EOF data, no need to update i_size
> > > + */
> > > + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> > > + if (error)
> > > + goto err_cancel;
> > > +
> > > + error = xfs_trans_commit(tp);
> > > + if (error)
> > > + goto err_cancel;
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +
> > > + return error;
> > > +
> > > +err_cancel:
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > + xfs_trans_cancel(tp);
> > > + return error;
> > > +}
> > > +
> > > +
> > > +/*
> > > + * Prepare to enable fsverity by clearing old metadata.
> > > + */
> > > +static int
> > > +xfs_fsverity_begin_enable(
> > > + struct file *filp)
> > > +{
> > > + struct inode *inode = file_inode(filp);
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + int error;
> > > +
> > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > > +
> > > + if (IS_DAX(inode))
> > > + return -EINVAL;
> > > +
> > > + if (inode->i_size > XFS_FSVERITY_REGION_START)
> > > + return -EFBIG;
> > > +
> > > + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> > > + return -EBUSY;
> > > +
> > > + error = xfs_qm_dqattach(ip);
> > > + if (error)
> > > + return error;
> > > +
> > > + /*
> > > + * Flush pagecache before building Merkle tree. Inode is locked and no
> > > + * further writes will happen to the file except fsverity metadata
> >
> > Don't we need to take the MMAPLOCK to prevent concurrent write faults?
>
> will add it
>
> >
> > > + */
> > > + error = filemap_write_and_wait(inode->i_mapping);
> > > + if (error)
> > > + return error;
> > > +
> > > + return xfs_fsverity_delete_metadata(ip);
> > > +}
> > > +
> > > +/*
> > > + * Complete (or fail) the process of enabling fsverity.
> > > + */
> > > +static int
> > > +xfs_fsverity_end_enable(
> > > + struct file *filp,
> > > + const void *desc,
> > > + size_t desc_size,
> > > + u64 merkle_tree_size)
> > > +{
> > > + struct inode *inode = file_inode(filp);
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + struct xfs_mount *mp = ip->i_mount;
> > > + struct xfs_trans *tp;
> > > + int error = 0;
> > > +
> > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > > +
> > > + /* fs-verity failed, just cleanup */
> > > + if (desc == NULL)
> > > + goto out;
> > > +
> > > + error = xfs_fsverity_write_descriptor(ip, desc, desc_size,
> > > + merkle_tree_size);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Wait for Merkle tree get written to disk before setting on-disk inode
> > > + * flag and clearing XFS_VERITY_CONSTRUCTION
> > > + */
> > > + error = filemap_write_and_wait(inode->i_mapping);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Set fsverity inode flag
> > > + */
> > > + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> > > + 0, 0, false, &tp);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Ensure that we've persisted the verity information before we enable
> > > + * it on the inode and tell the caller we have sealed the inode.
> > > + */
> > > + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> > > +
> > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > > + xfs_trans_set_sync(tp);
> > > +
> > > + error = xfs_trans_commit(tp);
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +
> > > + if (!error)
> > > + inode->i_flags |= S_VERITY;
> > > +
> > > +out:
> > > + if (error) {
> > > + int error2;
> > > +
> > > + error2 = xfs_fsverity_delete_metadata(ip);
> > > + if (error2)
> > > + xfs_alert(ip->i_mount,
> > > +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> > > + ip->i_ino, error2);
> > > + }
> > > +
> > > + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> > > + return error;
> > > +}
> > > +
> > > +/*
> > > + * Retrieve a merkle tree block.
> > > + */
> > > +static struct page *
> > > +xfs_fsverity_read_merkle(
> > > + struct inode *inode,
> > > + pgoff_t index,
> > > + unsigned long num_ra_pages)
> > > +{
> > > + struct folio *folio;
> > > + pgoff_t offset =
> > > + index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
> > > +
> > > + folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
> > > + if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> > > + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
> > > +
> > > + if (!IS_ERR(folio))
> > > + folio_put(folio);
> > > + else if (num_ra_pages > 1)
> > > + page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
> > > + folio = read_mapping_folio(inode->i_mapping, offset, NULL);
> > > + if (IS_ERR(folio))
> > > + return ERR_CAST(folio);
> > > + }
> > > + return folio_file_page(folio, offset);
> >
> > Shouldn't this be some _BEYOND_EOF variant of generic_file_read_iter?
>
> _BEYOND_EOF is set by xfs_iomap_read_begin() while mapping the
> blocks, and also this uses _unbounded version of readhead to skip
> EOF checks.
>
> >
> > > +}
> > > +
> > > +/*
> > > + * Write a merkle tree block.
> > > + */
> > > +static int
> > > +xfs_fsverity_write_merkle(
> > > + struct inode *inode,
> > > + const void *buf,
> > > + u64 pos,
> > > + unsigned int size)
> > > +{
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + loff_t position = pos | XFS_FSVERITY_REGION_START;
> > > +
> > > + if (position + size > inode->i_sb->s_maxbytes)
> > > + return -EFBIG;
> > > +
> > > + return xfs_fsverity_write(ip, position, size, buf);
> > > +}
> > > +
> > > +const ptrdiff_t info_offs = (int)offsetof(struct xfs_inode, i_verity_info) -
> > > + (int)offsetof(struct xfs_inode, i_vnode);
> >
> > I ... wow.
> >
> > Not blaming you for writing this, just surprised that the common code
> > makes you do that.
> >
> > > +const struct fsverity_operations xfs_fsverity_ops = {
> > > + .inode_info_offs = info_offs,
> > > + .begin_enable_verity = xfs_fsverity_begin_enable,
> > > + .end_enable_verity = xfs_fsverity_end_enable,
> > > + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> > > + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> > > + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> > > +};
> > > diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> > > new file mode 100644
> > > index 0000000000..8b0d7ef456
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_fsverity.h
> > > @@ -0,0 +1,12 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * Copyright (C) 2022 Red Hat, Inc.
> > > + */
> > > +#ifndef __XFS_FSVERITY_H__
> > > +#define __XFS_FSVERITY_H__
> > > +
> > > +#ifdef CONFIG_FS_VERITY
> > > +extern const struct fsverity_operations xfs_fsverity_ops;
> > > +#endif /* CONFIG_FS_VERITY */
> > > +
> > > +#endif /* __XFS_FSVERITY_H__ */
> > > diff --git a/fs/xfs/xfs_message.c b/fs/xfs/xfs_message.c
> > > index 19aba2c3d5..17f0f0ca7b 100644
> > > --- a/fs/xfs/xfs_message.c
> > > +++ b/fs/xfs/xfs_message.c
> > > @@ -161,6 +161,10 @@
> > > .opstate = XFS_OPSTATE_WARNED_ZONED,
> > > .name = "zoned RT device",
> > > },
> > > + [XFS_EXPERIMENTAL_FSVERITY] = {
> > > + .opstate = XFS_OPSTATE_WARNED_ZONED,
> > > + .name = "fsverity",
> > > + },
> > > };
> > > ASSERT(feat >= 0 && feat < XFS_EXPERIMENTAL_MAX);
> > > BUILD_BUG_ON(ARRAY_SIZE(features) != XFS_EXPERIMENTAL_MAX);
> > > diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
> > > index d68e72379f..1647d32ea4 100644
> > > --- a/fs/xfs/xfs_message.h
> > > +++ b/fs/xfs/xfs_message.h
> > > @@ -96,6 +96,7 @@
> > > XFS_EXPERIMENTAL_LBS,
> > > XFS_EXPERIMENTAL_METADIR,
> > > XFS_EXPERIMENTAL_ZONED,
> > > + XFS_EXPERIMENTAL_FSVERITY,
> > >
> > > XFS_EXPERIMENTAL_MAX,
> > > };
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 10c6fc8d20..42a16b15a6 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -30,6 +30,7 @@
> > > #include "xfs_filestream.h"
> > > #include "xfs_quota.h"
> > > #include "xfs_sysfs.h"
> > > +#include "xfs_fsverity.h"
> > > #include "xfs_ondisk.h"
> > > #include "xfs_rmap_item.h"
> > > #include "xfs_refcount_item.h"
> > > @@ -54,6 +55,7 @@
> > > #include <linux/fs_context.h>
> > > #include <linux/fs_parser.h>
> > > #include <linux/fsverity.h>
> > > +#include <linux/iomap.h>
> > >
> > > static const struct super_operations xfs_super_operations;
> > >
> > > @@ -1706,6 +1708,9 @@
> > > sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> > > #endif
> > > sb->s_op = &xfs_super_operations;
> > > +#ifdef CONFIG_FS_VERITY
> > > + sb->s_vop = &xfs_fsverity_ops;
> > > +#endif
> > >
> > > /*
> > > * Delay mount work if the debug hook is set. This is debug
> > > @@ -1959,10 +1964,19 @@
> > > xfs_set_resuming_quotaon(mp);
> > > mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
> > >
> > > + if (xfs_has_verity(mp))
> > > + xfs_warn_experimental(mp, XFS_EXPERIMENTAL_FSVERITY);
> > > +
> > > error = xfs_mountfs(mp);
> > > if (error)
> > > goto out_filestream_unmount;
> > >
> > > +#ifdef CONFIG_FS_VERITY
> > > + error = iomap_fsverity_init_bioset();
> >
> > if (xfs_has_verity()) ?
>
> sure, will change it
>
> --
> - Andrey
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 16/22] xfs: add fs-verity support
2026-01-12 23:05 ` Darrick J. Wong
2026-01-13 18:32 ` Andrey Albershteyn
@ 2026-01-16 14:52 ` Andrey Albershteyn
1 sibling, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-16 14:52 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
> > + desc_pos = round_down(desc_size_pos - desc_size, blocksize);
> > + error = xfs_fsverity_read(inode, buf, desc_size, desc_pos);
> > + if (error)
> > + return error;
> > +
> > + return desc_size;
> > +}
>
> You might want to wrap the integrity checks through XFS_IS_CORRUPT so
> that we get some logging on corrupt fsverity data. Also, if descriptor
> corruption doesn't prevent iget from completing, then we ought to define
> a new health state for the xfs_inode so that it can report those kinds
> of failures via bulkstat.
yeah, iget will complete as it doesn't trigger fsverity to read
descriptor. I will add a new health state. Thanks!
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 17/22] xfs: add fs-verity ioctls
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (15 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 16/22] xfs: add fs-verity support Andrey Albershteyn
@ 2026-01-12 14:51 ` Andrey Albershteyn
2026-01-12 14:52 ` [PATCH v2 18/22] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
` (5 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:51 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Add fs-verity ioctls to enable, dump metadata (descriptor and Merkle
tree pages) and obtain file's digest.
[djwong: remove unnecessary casting]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_ioctl.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 59eaad7743..d343af6aa0 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -44,6 +44,7 @@
#include <linux/mount.h>
#include <linux/fileattr.h>
+#include <linux/fsverity.h>
/* Return 0 on success or positive error */
int
@@ -1419,6 +1420,21 @@
case XFS_IOC_COMMIT_RANGE:
return xfs_ioc_commit_range(filp, arg);
+ case FS_IOC_ENABLE_VERITY:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_enable(filp, arg);
+
+ case FS_IOC_MEASURE_VERITY:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_measure(filp, arg);
+
+ case FS_IOC_READ_VERITY_METADATA:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_read_metadata(filp, arg);
+
default:
return -ENOTTY;
}
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 18/22] xfs: advertise fs-verity being available on filesystem
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (16 preceding siblings ...)
2026-01-12 14:51 ` [PATCH v2 17/22] xfs: add fs-verity ioctls Andrey Albershteyn
@ 2026-01-12 14:52 ` Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 19/22] xfs: check and repair the verity inode flag state Darrick J. Wong
` (4 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 14:52 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Advertise that this filesystem supports fsverity.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 2 ++
2 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b73458a7c2..3db9beb579 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -250,6 +250,7 @@
#define XFS_FSOP_GEOM_FLAGS_PARENT (1 << 25) /* linux parent pointers */
#define XFS_FSOP_GEOM_FLAGS_METADIR (1 << 26) /* metadata directories */
#define XFS_FSOP_GEOM_FLAGS_ZONED (1 << 27) /* zoned rt device */
+#define XFS_FSOP_GEOM_FLAGS_VERITY (1 << 28) /* fs-verity */
/*
* Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 744bd8480b..fe400bffa5 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1587,6 +1587,8 @@
geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
if (xfs_has_zoned(mp))
geo->flags |= XFS_FSOP_GEOM_FLAGS_ZONED;
+ if (xfs_has_verity(mp))
+ geo->flags |= XFS_FSOP_GEOM_FLAGS_VERITY;
geo->rtsectsize = sbp->sb_blocksize;
geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 19/22] xfs: check and repair the verity inode flag state
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (17 preceding siblings ...)
2026-01-12 14:52 ` [PATCH v2 18/22] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
@ 2026-01-12 14:52 ` Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 20/22] xfs: report verity failures through the health system Darrick J. Wong
` (3 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 14:52 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
If an inode has the incore verity iflag set, make sure that we can
actually activate fsverity on that inode. If activation fails due to
a fsverity metadata validation error, clear the flag. The usage model
for fsverity requires that any program that cares about verity state is
required to call statx/getflags to check that the flag is set after
opening the file, so clearing the flag will not compromise that model.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/scrub/attr.c | 7 ++++++
fs/xfs/scrub/common.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/common.h | 2 +
fs/xfs/scrub/inode.c | 7 ++++++
fs/xfs/scrub/inode_repair.c | 36 +++++++++++++++++++++++++++++++
5 files changed, 105 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 708334f9b2..b1448832ae 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -646,6 +646,13 @@
if (!xfs_inode_hasattr(sc->ip))
return -ENOENT;
+ /*
+ * If this is a verity file that won't activate, we cannot check the
+ * merkle tree geometry.
+ */
+ if (xchk_inode_verity_broken(sc->ip))
+ xchk_set_incomplete(sc);
+
/* Allocate memory for xattr checking. */
error = xchk_setup_xattr_buf(sc, 0);
if (error == -ENOMEM)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 7bfa37c994..41bbf4648f 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -45,6 +45,8 @@
#include "scrub/health.h"
#include "scrub/tempfile.h"
+#include <linux/fsverity.h>
+
/* Common code for the metadata scrubbers. */
/*
@@ -1736,3 +1738,54 @@
return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, nextents,
count);
}
+
+/*
+ * If this inode has S_VERITY set on it, read the verity info. If the reading
+ * fails with anything other than ENOMEM, the file is corrupt, which we can
+ * detect later with fsverity_active.
+ *
+ * Callers must hold the IOLOCK and must not hold the ILOCK of sc->ip because
+ * activation reads inode data.
+ */
+int
+xchk_inode_setup_verity(
+ struct xfs_scrub *sc)
+{
+ int error;
+
+ if (!IS_VERITY(VFS_I(sc->ip)))
+ return 0;
+
+ error = fsverity_ensure_verity_info(VFS_I(sc->ip));
+ switch (error) {
+ case 0:
+ /* fsverity is active */
+ break;
+ case -ENODATA:
+ case -EMSGSIZE:
+ case -EINVAL:
+ case -EFSCORRUPTED:
+ case -EFBIG:
+ /*
+ * The nonzero errno codes above are the error codes that can
+ * be returned from fsverity on metadata validation errors.
+ */
+ return 0;
+ default:
+ /* runtime errors */
+ return error;
+ }
+
+ return 0;
+}
+
+/*
+ * Is this a verity file that failed to activate? Callers must have tried to
+ * activate fsverity via xchk_inode_setup_verity.
+ */
+bool
+xchk_inode_verity_broken(
+ struct xfs_inode *ip)
+{
+ return IS_VERITY(VFS_I(ip)) && !fsverity_active(VFS_I(ip));
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index ddbc065c79..36d6a33337 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -289,6 +289,8 @@
bool *inuse);
int xchk_inode_count_blocks(struct xfs_scrub *sc, int whichfork,
xfs_extnum_t *nextents, xfs_filblks_t *count);
+int xchk_inode_setup_verity(struct xfs_scrub *sc);
+bool xchk_inode_verity_broken(struct xfs_inode *ip);
bool xchk_inode_is_dirtree_root(const struct xfs_inode *ip);
bool xchk_inode_is_sb_rooted(const struct xfs_inode *ip);
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index bb3f475b63..1e7cfef00a 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -36,6 +36,10 @@
xchk_ilock(sc, XFS_IOLOCK_EXCL);
+ error = xchk_inode_setup_verity(sc);
+ if (error)
+ return error;
+
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
@@ -833,6 +837,9 @@
if (S_ISREG(VFS_I(sc->ip)->i_mode))
xchk_inode_check_reflink_iflag(sc, sc->ip->i_ino);
+ if (xchk_inode_verity_broken(sc->ip))
+ xchk_ino_set_corrupt(sc, sc->sm->sm_ino);
+
xchk_inode_check_unlinked(sc);
xchk_inode_xref(sc, sc->ip->i_ino, &di);
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index 4f7040c9dd..846a47286e 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -573,6 +573,8 @@
dip->di_nrext64_pad = 0;
else if (dip->di_version >= 3)
dip->di_v3_pad = 0;
+ if (!xfs_has_verity(mp) || !S_ISREG(mode))
+ flags2 &= ~XFS_DIFLAG2_VERITY;
if (flags2 & XFS_DIFLAG2_METADATA) {
xfs_failaddr_t fa;
@@ -1613,6 +1615,10 @@
if (iget_error)
return iget_error;
+ error = xchk_inode_setup_verity(sc);
+ if (error)
+ return error;
+
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
@@ -2032,6 +2038,27 @@
return 0;
}
+/*
+ * If this file is a fsverity file, xchk_prepare_iscrub or xrep_dinode_core
+ * should have activated it. If it's still not active, then there's something
+ * wrong with the verity descriptor and we should turn it off.
+ */
+STATIC int
+xrep_inode_verity(
+ struct xfs_scrub *sc)
+{
+ struct inode *inode = VFS_I(sc->ip);
+
+ if (xchk_inode_verity_broken(sc->ip)) {
+ sc->ip->i_diflags2 &= ~XFS_DIFLAG2_VERITY;
+ inode->i_flags &= ~S_VERITY;
+
+ xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+ }
+
+ return 0;
+}
+
/* Repair an inode's fields. */
int
xrep_inode(
@@ -2081,6 +2108,15 @@
return error;
}
+ /*
+ * Disable fsverity if it cannot be activated. Activation failure
+ * prohibits the file from being opened, so there cannot be another
+ * program with an open fd to what it thinks is a verity file.
+ */
+ error = xrep_inode_verity(sc);
+ if (error)
+ return error;
+
/* Reconnect incore unlinked list */
error = xrep_inode_unlinked(sc);
if (error)
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 20/22] xfs: report verity failures through the health system
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (18 preceding siblings ...)
2026-01-12 14:52 ` [PATCH v2 19/22] xfs: check and repair the verity inode flag state Darrick J. Wong
@ 2026-01-12 14:52 ` Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 21/22] xfs: add fsverity traces Andrey Albershteyn
` (2 subsequent siblings)
22 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 14:52 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Record verity failures and report them through the health system.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/libxfs/xfs_health.h | 4 +++-
fs/xfs/xfs_fsverity.c | 11 +++++++++++
fs/xfs/xfs_health.c | 1 +
4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 3db9beb579..b82fef9a1f 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -422,6 +422,7 @@
#define XFS_BS_SICK_SYMLINK (1 << 6) /* symbolic link remote target */
#define XFS_BS_SICK_PARENT (1 << 7) /* parent pointers */
#define XFS_BS_SICK_DIRTREE (1 << 8) /* directory tree structure */
+#define XFS_BS_SICK_DATA (1 << 9) /* file data */
/*
* Project quota id helpers (previously projid was 16bit only
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index b31000f719..fa91916ad0 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -104,6 +104,7 @@
/* Don't propagate sick status to ag health summary during inactivation */
#define XFS_SICK_INO_FORGET (1 << 12)
#define XFS_SICK_INO_DIRTREE (1 << 13) /* directory tree structure */
+#define XFS_SICK_INO_DATA (1 << 14) /* file data */
/* Primary evidence of health problems in a given group. */
#define XFS_SICK_FS_PRIMARY (XFS_SICK_FS_COUNTERS | \
@@ -140,7 +141,8 @@
XFS_SICK_INO_XATTR | \
XFS_SICK_INO_SYMLINK | \
XFS_SICK_INO_PARENT | \
- XFS_SICK_INO_DIRTREE)
+ XFS_SICK_INO_DIRTREE | \
+ XFS_SICK_INO_DATA)
#define XFS_SICK_INO_ZAPPED (XFS_SICK_INO_BMBTD_ZAPPED | \
XFS_SICK_INO_BMBTA_ZAPPED | \
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index 691dc60778..f53a404578 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -18,6 +18,7 @@
#include "xfs_quota.h"
#include "xfs_fsverity.h"
#include "xfs_iomap.h"
+#include "xfs_health.h"
#include <linux/fsverity.h>
#include <linux/pagemap.h>
@@ -363,6 +364,15 @@
return xfs_fsverity_write(ip, position, size, buf);
}
+static void
+xfs_fsverity_file_corrupt(
+ struct inode *inode,
+ loff_t pos,
+ size_t len)
+{
+ xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
+}
+
const ptrdiff_t info_offs = (int)offsetof(struct xfs_inode, i_verity_info) -
(int)offsetof(struct xfs_inode, i_vnode);
@@ -373,4 +383,5 @@
.get_verity_descriptor = xfs_fsverity_get_descriptor,
.read_merkle_tree_page = xfs_fsverity_read_merkle,
.write_merkle_tree_block = xfs_fsverity_write_merkle,
+ .file_corrupt = xfs_fsverity_file_corrupt,
};
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index 3c1557fb1c..b851651c02 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -487,6 +487,7 @@
{ XFS_SICK_INO_DIR_ZAPPED, XFS_BS_SICK_DIR },
{ XFS_SICK_INO_SYMLINK_ZAPPED, XFS_BS_SICK_SYMLINK },
{ XFS_SICK_INO_DIRTREE, XFS_BS_SICK_DIRTREE },
+ { XFS_SICK_INO_DATA, XFS_BS_SICK_DATA },
};
/* Fill out bulkstat health info. */
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* [PATCH v2 21/22] xfs: add fsverity traces
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (19 preceding siblings ...)
2026-01-12 14:52 ` [PATCH v2 20/22] xfs: report verity failures through the health system Darrick J. Wong
@ 2026-01-12 14:52 ` Andrey Albershteyn
2026-01-12 23:07 ` Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 22/22] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
2026-01-13 16:36 ` [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Matthew Wilcox
22 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:52 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Even though fsverity has traces, debugging issues with varying block
sizes could be a bit less transparent without read/write traces.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_fsverity.c | 8 ++++++++
fs/xfs/xfs_trace.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index f53a404578..06eac2561b 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -102,6 +102,8 @@
uint32_t blocksize = i_blocksize(VFS_I(ip));
xfs_fileoff_t last_block;
+ trace_xfs_fsverity_get_descriptor(ip);
+
ASSERT(inode->i_flags & S_VERITY);
error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &rec, &is_empty);
if (error)
@@ -330,6 +332,8 @@
pgoff_t offset =
index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
+ trace_xfs_fsverity_read_merkle(XFS_I(inode), offset, PAGE_SIZE);
+
folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
@@ -358,6 +362,8 @@
struct xfs_inode *ip = XFS_I(inode);
loff_t position = pos | XFS_FSVERITY_REGION_START;
+ trace_xfs_fsverity_write_merkle(XFS_I(inode), pos, size);
+
if (position + size > inode->i_sb->s_maxbytes)
return -EFBIG;
@@ -370,6 +376,8 @@
loff_t pos,
size_t len)
{
+ trace_xfs_fsverity_file_corrupt(XFS_I(inode), pos, len);
+
xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f70afbf3cb..1ce4e10b6b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -5906,6 +5906,52 @@
DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_reserved);
DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_enospc);
+TRACE_EVENT(xfs_fsverity_get_descriptor,
+ TP_PROTO(struct xfs_inode *ip),
+ TP_ARGS(ip),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino)
+);
+
+DECLARE_EVENT_CLASS(xfs_fsverity_class,
+ TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length),
+ TP_ARGS(ip, pos, length),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(u64, pos)
+ __field(unsigned int, length)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->pos = pos;
+ __entry->length = length;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx pos %llx length %x",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pos,
+ __entry->length)
+)
+
+#define DEFINE_FSVERITY_EVENT(name) \
+DEFINE_EVENT(xfs_fsverity_class, name, \
+ TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length), \
+ TP_ARGS(ip, pos, length))
+DEFINE_FSVERITY_EVENT(xfs_fsverity_read_merkle);
+DEFINE_FSVERITY_EVENT(xfs_fsverity_write_merkle);
+DEFINE_FSVERITY_EVENT(xfs_fsverity_file_corrupt);
+
#endif /* _TRACE_XFS_H */
#undef TRACE_INCLUDE_PATH
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 21/22] xfs: add fsverity traces
2026-01-12 14:52 ` [PATCH v2 21/22] xfs: add fsverity traces Andrey Albershteyn
@ 2026-01-12 23:07 ` Darrick J. Wong
0 siblings, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-12 23:07 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david,
hch
On Mon, Jan 12, 2026 at 03:52:20PM +0100, Andrey Albershteyn wrote:
> Even though fsverity has traces, debugging issues with varying block
> sizes could be a bit less transparent without read/write traces.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/xfs_fsverity.c | 8 ++++++++
> fs/xfs/xfs_trace.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+), 0 deletions(-)
>
> diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> index f53a404578..06eac2561b 100644
> --- a/fs/xfs/xfs_fsverity.c
> +++ b/fs/xfs/xfs_fsverity.c
> @@ -102,6 +102,8 @@
> uint32_t blocksize = i_blocksize(VFS_I(ip));
> xfs_fileoff_t last_block;
>
> + trace_xfs_fsverity_get_descriptor(ip);
> +
> ASSERT(inode->i_flags & S_VERITY);
> error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &rec, &is_empty);
> if (error)
> @@ -330,6 +332,8 @@
> pgoff_t offset =
> index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
>
> + trace_xfs_fsverity_read_merkle(XFS_I(inode), offset, PAGE_SIZE);
> +
> folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
> if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
> @@ -358,6 +362,8 @@
> struct xfs_inode *ip = XFS_I(inode);
> loff_t position = pos | XFS_FSVERITY_REGION_START;
>
> + trace_xfs_fsverity_write_merkle(XFS_I(inode), pos, size);
> +
> if (position + size > inode->i_sb->s_maxbytes)
> return -EFBIG;
>
> @@ -370,6 +376,8 @@
> loff_t pos,
> size_t len)
> {
> + trace_xfs_fsverity_file_corrupt(XFS_I(inode), pos, len);
> +
> xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
> }
>
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index f70afbf3cb..1ce4e10b6b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -5906,6 +5906,52 @@
> DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_reserved);
> DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_enospc);
>
> +TRACE_EVENT(xfs_fsverity_get_descriptor,
> + TP_PROTO(struct xfs_inode *ip),
> + TP_ARGS(ip),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_ino_t, ino)
> + ),
> + TP_fast_assign(
> + __entry->dev = VFS_I(ip)->i_sb->s_dev;
> + __entry->ino = ip->i_ino;
> + ),
> + TP_printk("dev %d:%d ino 0x%llx",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->ino)
> +);
> +
> +DECLARE_EVENT_CLASS(xfs_fsverity_class,
> + TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length),
> + TP_ARGS(ip, pos, length),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_ino_t, ino)
> + __field(u64, pos)
> + __field(unsigned int, length)
> + ),
> + TP_fast_assign(
> + __entry->dev = VFS_I(ip)->i_sb->s_dev;
> + __entry->ino = ip->i_ino;
> + __entry->pos = pos;
> + __entry->length = length;
> + ),
> + TP_printk("dev %d:%d ino 0x%llx pos %llx length %x",
pos and length should have '0x' before them since they're presented in
hexadecimal.
Otherwise this looks ok so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->ino,
> + __entry->pos,
> + __entry->length)
> +)
> +
> +#define DEFINE_FSVERITY_EVENT(name) \
> +DEFINE_EVENT(xfs_fsverity_class, name, \
> + TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length), \
> + TP_ARGS(ip, pos, length))
> +DEFINE_FSVERITY_EVENT(xfs_fsverity_read_merkle);
> +DEFINE_FSVERITY_EVENT(xfs_fsverity_write_merkle);
> +DEFINE_FSVERITY_EVENT(xfs_fsverity_file_corrupt);
> +
> #endif /* _TRACE_XFS_H */
>
> #undef TRACE_INCLUDE_PATH
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v2 22/22] xfs: enable ro-compat fs-verity flag
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (20 preceding siblings ...)
2026-01-12 14:52 ` [PATCH v2 21/22] xfs: add fsverity traces Andrey Albershteyn
@ 2026-01-12 14:52 ` Andrey Albershteyn
2026-01-13 16:36 ` [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Matthew Wilcox
22 siblings, 0 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-12 14:52 UTC (permalink / raw)
To: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, aalbersh,
djwong
Cc: djwong, david, hch
Finalize fs-verity integration in XFS by making kernel fs-verity
aware with ro-compat flag.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: add spaces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_format.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d67b404964..f5e43909f0 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -378,8 +378,9 @@
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
- XFS_SB_FEAT_RO_COMPAT_REFLINK| \
- XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+ XFS_SB_FEAT_RO_COMPAT_REFLINK | \
+ XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+ XFS_SB_FEAT_RO_COMPAT_VERITY)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
--
- Andrey
^ permalink raw reply related [flat|nested] 86+ messages in thread* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (21 preceding siblings ...)
2026-01-12 14:52 ` [PATCH v2 22/22] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
@ 2026-01-13 16:36 ` Matthew Wilcox
2026-01-13 18:45 ` Andrey Albershteyn
22 siblings, 1 reply; 86+ messages in thread
From: Matthew Wilcox @ 2026-01-13 16:36 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david, hch
On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> The tree is read by iomap into page cache at offset 1 << 53. This is far
> enough to handle any supported file size.
What happens on 32-bit systems? (I presume you mean "offset" as
"index", so this is 1 << 65 bytes on machines with a 4KiB page size)
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-13 16:36 ` [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Matthew Wilcox
@ 2026-01-13 18:45 ` Andrey Albershteyn
2026-01-14 5:00 ` Matthew Wilcox
0 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-13 18:45 UTC (permalink / raw)
To: Matthew Wilcox
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david, hch
On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > enough to handle any supported file size.
>
> What happens on 32-bit systems? (I presume you mean "offset" as
> "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
>
it's in bytes, yeah I missed 32-bit systems, I think I will try to
convert this offset to something lower on 32-bit in iomap, as
Darrick suggested.
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-13 18:45 ` Andrey Albershteyn
@ 2026-01-14 5:00 ` Matthew Wilcox
2026-01-14 6:15 ` Darrick J. Wong
0 siblings, 1 reply; 86+ messages in thread
From: Matthew Wilcox @ 2026-01-14 5:00 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, djwong,
david, hch
On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote:
> On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > > enough to handle any supported file size.
> >
> > What happens on 32-bit systems? (I presume you mean "offset" as
> > "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
> >
> it's in bytes, yeah I missed 32-bit systems, I think I will try to
> convert this offset to something lower on 32-bit in iomap, as
> Darrick suggested.
Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's
MAX_LFS_FILESIZE. Are you proposing reducing that?
There are some other (performance) penalties to using 1<<53 as the lowest
index for metadata on 64-bit. The radix tree is going to go quite high;
we use 6 bits at each level, so if you have a folio at 0 and a folio at
1<<53, you'll have a tree of height 9 and use 17 nodes.
That's going to be a lot of extra cache misses when walking the XArray
to find any given folio. Allowing the filesystem to decide where the
metadata starts for any given file really is an important optimisation.
Even if it starts at index 1<<29, you'll almost halve the number of
nodes needed.
Adding this ability to support RW merkel trees is certainly coming at
a cost. Is it worth it? I haven't seen a user need for that articulated,
but I haven't been paying close attention.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-14 5:00 ` Matthew Wilcox
@ 2026-01-14 6:15 ` Darrick J. Wong
2026-01-14 8:20 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-14 6:15 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrey Albershteyn, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote:
> > On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > > > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > > > enough to handle any supported file size.
> > >
> > > What happens on 32-bit systems? (I presume you mean "offset" as
> > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
> > >
> > it's in bytes, yeah I missed 32-bit systems, I think I will try to
> > convert this offset to something lower on 32-bit in iomap, as
> > Darrick suggested.
>
> Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's
> MAX_LFS_FILESIZE. Are you proposing reducing that?
>
> There are some other (performance) penalties to using 1<<53 as the lowest
> index for metadata on 64-bit. The radix tree is going to go quite high;
> we use 6 bits at each level, so if you have a folio at 0 and a folio at
> 1<<53, you'll have a tree of height 9 and use 17 nodes.
>
> That's going to be a lot of extra cache misses when walking the XArray
> to find any given folio. Allowing the filesystem to decide where the
> metadata starts for any given file really is an important optimisation.
> Even if it starts at index 1<<29, you'll almost halve the number of
> nodes needed.
1<<53 is only the location of the fsverity metadata in the ondisk
mapping. For the incore mapping, in theory we could load the fsverity
anywhere in the post-EOF part of the pagecache to save some bits.
roundup(i_size_read(), 1<<folio_max_order)) would work, right?
> Adding this ability to support RW merkel trees is certainly coming at
> a cost. Is it worth it? I haven't seen a user need for that articulated,
> but I haven't been paying close attention.
I think the pagecache writes of fsverity metadata are only performed
once, while enabling fsverity for a file.
--D
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-14 6:15 ` Darrick J. Wong
@ 2026-01-14 8:20 ` Andrey Albershteyn
2026-01-14 9:53 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-14 8:20 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Matthew Wilcox, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On 2026-01-13 22:15:36, Darrick J. Wong wrote:
> On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote:
> > > On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> > > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > > > > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > > > > enough to handle any supported file size.
> > > >
> > > > What happens on 32-bit systems? (I presume you mean "offset" as
> > > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
> > > >
> > > it's in bytes, yeah I missed 32-bit systems, I think I will try to
> > > convert this offset to something lower on 32-bit in iomap, as
> > > Darrick suggested.
> >
> > Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's
> > MAX_LFS_FILESIZE. Are you proposing reducing that?
> >
> > There are some other (performance) penalties to using 1<<53 as the lowest
> > index for metadata on 64-bit. The radix tree is going to go quite high;
> > we use 6 bits at each level, so if you have a folio at 0 and a folio at
> > 1<<53, you'll have a tree of height 9 and use 17 nodes.
> >
> > That's going to be a lot of extra cache misses when walking the XArray
> > to find any given folio. Allowing the filesystem to decide where the
> > metadata starts for any given file really is an important optimisation.
> > Even if it starts at index 1<<29, you'll almost halve the number of
> > nodes needed.
Thanks for this overview!
>
> 1<<53 is only the location of the fsverity metadata in the ondisk
> mapping. For the incore mapping, in theory we could load the fsverity
> anywhere in the post-EOF part of the pagecache to save some bits.
>
> roundup(i_size_read(), 1<<folio_max_order)) would work, right?
Then, there's probably no benefits to have ondisk mapping differ,
no?
>
> > Adding this ability to support RW merkel trees is certainly coming at
> > a cost. Is it worth it? I haven't seen a user need for that articulated,
> > but I haven't been paying close attention.
>
> I think the pagecache writes of fsverity metadata are only performed
> once, while enabling fsverity for a file.
yes
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-14 8:20 ` Andrey Albershteyn
@ 2026-01-14 9:53 ` Andrey Albershteyn
2026-01-14 16:42 ` Darrick J. Wong
2026-01-19 6:33 ` fsverity metadata offset, was: " Christoph Hellwig
0 siblings, 2 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-14 9:53 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Matthew Wilcox, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On 2026-01-14 09:20:34, Andrey Albershteyn wrote:
> On 2026-01-13 22:15:36, Darrick J. Wong wrote:
> > On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote:
> > > On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote:
> > > > On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> > > > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > > > > > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > > > > > enough to handle any supported file size.
> > > > >
> > > > > What happens on 32-bit systems? (I presume you mean "offset" as
> > > > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
> > > > >
> > > > it's in bytes, yeah I missed 32-bit systems, I think I will try to
> > > > convert this offset to something lower on 32-bit in iomap, as
> > > > Darrick suggested.
> > >
> > > Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's
> > > MAX_LFS_FILESIZE. Are you proposing reducing that?
> > >
> > > There are some other (performance) penalties to using 1<<53 as the lowest
> > > index for metadata on 64-bit. The radix tree is going to go quite high;
> > > we use 6 bits at each level, so if you have a folio at 0 and a folio at
> > > 1<<53, you'll have a tree of height 9 and use 17 nodes.
> > >
> > > That's going to be a lot of extra cache misses when walking the XArray
> > > to find any given folio. Allowing the filesystem to decide where the
> > > metadata starts for any given file really is an important optimisation.
> > > Even if it starts at index 1<<29, you'll almost halve the number of
> > > nodes needed.
>
> Thanks for this overview!
>
> >
> > 1<<53 is only the location of the fsverity metadata in the ondisk
> > mapping. For the incore mapping, in theory we could load the fsverity
> > anywhere in the post-EOF part of the pagecache to save some bits.
> >
> > roundup(i_size_read(), 1<<folio_max_order)) would work, right?
>
> Then, there's probably no benefits to have ondisk mapping differ,
> no?
oh, the fixed ondisk offset will help to not break if filesystem
would be mounted by machine with different page size.
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-14 9:53 ` Andrey Albershteyn
@ 2026-01-14 16:42 ` Darrick J. Wong
2026-01-19 6:33 ` fsverity metadata offset, was: " Christoph Hellwig
1 sibling, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-14 16:42 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Matthew Wilcox, fsverity, linux-xfs, ebiggers, linux-fsdevel,
aalbersh, david, hch
On Wed, Jan 14, 2026 at 10:53:00AM +0100, Andrey Albershteyn wrote:
> On 2026-01-14 09:20:34, Andrey Albershteyn wrote:
> > On 2026-01-13 22:15:36, Darrick J. Wong wrote:
> > > On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote:
> > > > On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote:
> > > > > On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> > > > > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > > > > > > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > > > > > > enough to handle any supported file size.
> > > > > >
> > > > > > What happens on 32-bit systems? (I presume you mean "offset" as
> > > > > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
> > > > > >
> > > > > it's in bytes, yeah I missed 32-bit systems, I think I will try to
> > > > > convert this offset to something lower on 32-bit in iomap, as
> > > > > Darrick suggested.
> > > >
> > > > Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's
> > > > MAX_LFS_FILESIZE. Are you proposing reducing that?
> > > >
> > > > There are some other (performance) penalties to using 1<<53 as the lowest
> > > > index for metadata on 64-bit. The radix tree is going to go quite high;
> > > > we use 6 bits at each level, so if you have a folio at 0 and a folio at
> > > > 1<<53, you'll have a tree of height 9 and use 17 nodes.
> > > >
> > > > That's going to be a lot of extra cache misses when walking the XArray
> > > > to find any given folio. Allowing the filesystem to decide where the
> > > > metadata starts for any given file really is an important optimisation.
> > > > Even if it starts at index 1<<29, you'll almost halve the number of
> > > > nodes needed.
> >
> > Thanks for this overview!
> >
> > >
> > > 1<<53 is only the location of the fsverity metadata in the ondisk
> > > mapping. For the incore mapping, in theory we could load the fsverity
> > > anywhere in the post-EOF part of the pagecache to save some bits.
> > >
> > > roundup(i_size_read(), 1<<folio_max_order)) would work, right?
> >
> > Then, there's probably no benefits to have ondisk mapping differ,
> > no?
>
> oh, the fixed ondisk offset will help to not break if filesystem
> would be mounted by machine with different page size.
Yeah. The i(ncore)ext(ent) tree won't have excessive nodes if you pick
a high file offset, so 1<<53 is ok for the ondisk file offset.
--D
> --
> - Andrey
>
^ permalink raw reply [flat|nested] 86+ messages in thread
* fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-14 9:53 ` Andrey Albershteyn
2026-01-14 16:42 ` Darrick J. Wong
@ 2026-01-19 6:33 ` Christoph Hellwig
2026-01-19 19:32 ` Eric Biggers
1 sibling, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-19 6:33 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Darrick J. Wong, Matthew Wilcox, fsverity, linux-xfs, ebiggers,
linux-fsdevel, aalbersh, david, hch, tytso, linux-ext4, jaegeuk,
chao, linux-f2fs-devel
While looking at fsverity I'd like to understand the choise of offset
in ext4 and f2fs, and wonder about an issue.
Both ext4 and f2fs round up the inode size to the next 64k boundary
and place the metadata there. Both use the 65536 magic number for that
instead of a well documented constant unfortunately.
I assume this was picked to align up to the largest reasonable page
size? Unfortunately for that:
a) not all architectures are reasonable. As Darrick pointed out
hexagon seems to support page size up to 1MiB. While I don't know
if they exist in real life, powerpc supports up to 256kiB pages,
and I know they are used for real in various embedded settings
b) with large folio support in the page cache, the folios used to
map files can be much larger than the base page size, with all
the same issues as a larger page size
So assuming that fsverity is trying to avoid the issue of a page/folio
that covers both data and fsverity metadata, how does it copy with that?
Do we need to disable fsverity on > 64k page size and disable large
folios on fsverity files? The latter would mean writing back all cached
data first as well.
And going forward, should we have a v2 format that fixes this? For that
we'd still need a maximum folio size of course. And of course I'd like
to get all these things right from the start in XFS, while still being as
similar as possible to ext4/f2fs.
On Wed, Jan 14, 2026 at 10:53:00AM +0100, Andrey Albershteyn wrote:
> On 2026-01-14 09:20:34, Andrey Albershteyn wrote:
> > On 2026-01-13 22:15:36, Darrick J. Wong wrote:
> > > On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote:
> > > > On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote:
> > > > > On 2026-01-13 16:36:44, Matthew Wilcox wrote:
> > > > > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote:
> > > > > > > The tree is read by iomap into page cache at offset 1 << 53. This is far
> > > > > > > enough to handle any supported file size.
> > > > > >
> > > > > > What happens on 32-bit systems? (I presume you mean "offset" as
> > > > > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size)
> > > > > >
> > > > > it's in bytes, yeah I missed 32-bit systems, I think I will try to
> > > > > convert this offset to something lower on 32-bit in iomap, as
> > > > > Darrick suggested.
> > > >
> > > > Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's
> > > > MAX_LFS_FILESIZE. Are you proposing reducing that?
> > > >
> > > > There are some other (performance) penalties to using 1<<53 as the lowest
> > > > index for metadata on 64-bit. The radix tree is going to go quite high;
> > > > we use 6 bits at each level, so if you have a folio at 0 and a folio at
> > > > 1<<53, you'll have a tree of height 9 and use 17 nodes.
> > > >
> > > > That's going to be a lot of extra cache misses when walking the XArray
> > > > to find any given folio. Allowing the filesystem to decide where the
> > > > metadata starts for any given file really is an important optimisation.
> > > > Even if it starts at index 1<<29, you'll almost halve the number of
> > > > nodes needed.
> >
> > Thanks for this overview!
> >
> > >
> > > 1<<53 is only the location of the fsverity metadata in the ondisk
> > > mapping. For the incore mapping, in theory we could load the fsverity
> > > anywhere in the post-EOF part of the pagecache to save some bits.
> > >
> > > roundup(i_size_read(), 1<<folio_max_order)) would work, right?
> >
> > Then, there's probably no benefits to have ondisk mapping differ,
> > no?
>
> oh, the fixed ondisk offset will help to not break if filesystem
> would be mounted by machine with different page size.
>
> --
> - Andrey
---end quoted text---
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-19 6:33 ` fsverity metadata offset, was: " Christoph Hellwig
@ 2026-01-19 19:32 ` Eric Biggers
2026-01-19 19:58 ` Darrick J. Wong
2026-01-19 20:00 ` Matthew Wilcox
0 siblings, 2 replies; 86+ messages in thread
From: Eric Biggers @ 2026-01-19 19:32 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, Darrick J. Wong, Matthew Wilcox, fsverity,
linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4,
jaegeuk, chao, linux-f2fs-devel
On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote:
> While looking at fsverity I'd like to understand the choise of offset
> in ext4 and f2fs, and wonder about an issue.
>
> Both ext4 and f2fs round up the inode size to the next 64k boundary
> and place the metadata there. Both use the 65536 magic number for that
> instead of a well documented constant unfortunately.
>
> I assume this was picked to align up to the largest reasonable page
> size? Unfortunately for that:
>
> a) not all architectures are reasonable. As Darrick pointed out
> hexagon seems to support page size up to 1MiB. While I don't know
> if they exist in real life, powerpc supports up to 256kiB pages,
> and I know they are used for real in various embedded settings
> b) with large folio support in the page cache, the folios used to
> map files can be much larger than the base page size, with all
> the same issues as a larger page size
>
> So assuming that fsverity is trying to avoid the issue of a page/folio
> that covers both data and fsverity metadata, how does it copy with that?
> Do we need to disable fsverity on > 64k page size and disable large
> folios on fsverity files? The latter would mean writing back all cached
> data first as well.
>
> And going forward, should we have a v2 format that fixes this? For that
> we'd still need a maximum folio size of course. And of course I'd like
> to get all these things right from the start in XFS, while still being as
> similar as possible to ext4/f2fs.
Yes, if I recall correctly it was intended to be the "largest reasonable
page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed
we should disable fsverity support in that configuration.
I don't think large folios are quite as problematic.
ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a
folio and return the appropriate page in it, and fs/verity/verify.c
operates on the page. If it's a page in the folio that spans EOF, I
think everything will actually still work, except userspace will be able
to see Merkle tree data after a 64K boundary past EOF if the file is
mmapped using huge pages.
The mmap issue isn't great, but I'm not sure how much it matters,
especially when the zeroes do still go up to a 64K boundary.
If we do need to fix this, there are a couple things we could consider
doing without changing the on-disk format in ext4 or f2fs: putting the
data in the page cache at a different offset than it exists on-disk, or
using "small" pages for EOF specifically.
But yes, XFS should choose a larger alignment than 64K.
- Eric
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-19 19:32 ` Eric Biggers
@ 2026-01-19 19:58 ` Darrick J. Wong
2026-01-20 7:32 ` Christoph Hellwig
2026-01-19 20:00 ` Matthew Wilcox
1 sibling, 1 reply; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-19 19:58 UTC (permalink / raw)
To: Eric Biggers
Cc: Christoph Hellwig, Andrey Albershteyn, Matthew Wilcox, fsverity,
linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4,
jaegeuk, chao, linux-f2fs-devel
On Mon, Jan 19, 2026 at 11:32:42AM -0800, Eric Biggers wrote:
> On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote:
> > While looking at fsverity I'd like to understand the choise of offset
> > in ext4 and f2fs, and wonder about an issue.
> >
> > Both ext4 and f2fs round up the inode size to the next 64k boundary
> > and place the metadata there. Both use the 65536 magic number for that
> > instead of a well documented constant unfortunately.
> >
> > I assume this was picked to align up to the largest reasonable page
> > size? Unfortunately for that:
> >
> > a) not all architectures are reasonable. As Darrick pointed out
> > hexagon seems to support page size up to 1MiB. While I don't know
> > if they exist in real life, powerpc supports up to 256kiB pages,
> > and I know they are used for real in various embedded settings
They *did* way back in the day, I worked with some seekrit PPC440s early
in my career. I don't know that any of them still exist, but the code
is still there...
> > b) with large folio support in the page cache, the folios used to
> > map files can be much larger than the base page size, with all
> > the same issues as a larger page size
> >
> > So assuming that fsverity is trying to avoid the issue of a page/folio
> > that covers both data and fsverity metadata, how does it copy with that?
> > Do we need to disable fsverity on > 64k page size and disable large
> > folios on fsverity files? The latter would mean writing back all cached
> > data first as well.
> >
> > And going forward, should we have a v2 format that fixes this? For that
> > we'd still need a maximum folio size of course. And of course I'd like
> > to get all these things right from the start in XFS, while still being as
> > similar as possible to ext4/f2fs.
>
> Yes, if I recall correctly it was intended to be the "largest reasonable
> page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed
> we should disable fsverity support in that configuration.
>
> I don't think large folios are quite as problematic.
> ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a
> folio and return the appropriate page in it, and fs/verity/verify.c
> operates on the page. If it's a page in the folio that spans EOF, I
> think everything will actually still work, except userspace will be able
> to see Merkle tree data after a 64K boundary past EOF if the file is
> mmapped using huge pages.
We don't allow mmapping file data beyond the EOF basepage, even if the
underlying folio is a large folio. See generic/749, though recently
Kiryl Shutsemau tried to remove that restriction[1], until dchinner and
willy told him no.
> The mmap issue isn't great, but I'm not sure how much it matters,
> especially when the zeroes do still go up to a 64K boundary.
I'm concerned that post-eof zeroing of a 256k folio could accidentally
obliterate merkle tree content that was somehow previously loaded.
Though afaict from the existing codebases, none of them actually make
that mistake.
> If we do need to fix this, there are a couple things we could consider
> doing without changing the on-disk format in ext4 or f2fs: putting the
> data in the page cache at a different offset than it exists on-disk, or
> using "small" pages for EOF specifically.
I'd leave the ondisk offset as-is, but change the pagecache offset to
roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep
file data and fsverity metadata completely separate.
> But yes, XFS should choose a larger alignment than 64K.
The roundup() formula above is what I'd choose for the pagecache offset
for xfs. The ondisk offset of 1<<53 is ok with me.
--D
[1] https://lore.kernel.org/linux-fsdevel/20251014175214.GW6188@frogsfrogsfrogs/
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-19 19:58 ` Darrick J. Wong
@ 2026-01-20 7:32 ` Christoph Hellwig
2026-01-20 11:44 ` Andrey Albershteyn
0 siblings, 1 reply; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-20 7:32 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Eric Biggers, Christoph Hellwig, Andrey Albershteyn,
Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh,
david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel
On Mon, Jan 19, 2026 at 11:58:16AM -0800, Darrick J. Wong wrote:
> > > a) not all architectures are reasonable. As Darrick pointed out
> > > hexagon seems to support page size up to 1MiB. While I don't know
> > > if they exist in real life, powerpc supports up to 256kiB pages,
> > > and I know they are used for real in various embedded settings
>
> They *did* way back in the day, I worked with some seekrit PPC440s early
> in my career. I don't know that any of them still exist, but the code
> is still there...
Sorry, I meant I don't really know how real the hexagon large page
sizes are. I know about the ppcs one personally, too.
> > If we do need to fix this, there are a couple things we could consider
> > doing without changing the on-disk format in ext4 or f2fs: putting the
> > data in the page cache at a different offset than it exists on-disk, or
> > using "small" pages for EOF specifically.
>
> I'd leave the ondisk offset as-is, but change the pagecache offset to
> roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep
> file data and fsverity metadata completely separate.
Can we find a way to do that in common code and make ext4 and f2fs do
the same?
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-20 7:32 ` Christoph Hellwig
@ 2026-01-20 11:44 ` Andrey Albershteyn
2026-01-20 17:34 ` Darrick J. Wong
2026-01-21 15:03 ` Christoph Hellwig
0 siblings, 2 replies; 86+ messages in thread
From: Andrey Albershteyn @ 2026-01-20 11:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, Eric Biggers, Matthew Wilcox, fsverity,
linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4,
jaegeuk, chao, linux-f2fs-devel
On 2026-01-20 08:32:18, Christoph Hellwig wrote:
> On Mon, Jan 19, 2026 at 11:58:16AM -0800, Darrick J. Wong wrote:
> > > > a) not all architectures are reasonable. As Darrick pointed out
> > > > hexagon seems to support page size up to 1MiB. While I don't know
> > > > if they exist in real life, powerpc supports up to 256kiB pages,
> > > > and I know they are used for real in various embedded settings
> >
> > They *did* way back in the day, I worked with some seekrit PPC440s early
> > in my career. I don't know that any of them still exist, but the code
> > is still there...
>
> Sorry, I meant I don't really know how real the hexagon large page
> sizes are. I know about the ppcs one personally, too.
>
> > > If we do need to fix this, there are a couple things we could consider
> > > doing without changing the on-disk format in ext4 or f2fs: putting the
> > > data in the page cache at a different offset than it exists on-disk, or
> > > using "small" pages for EOF specifically.
> >
> > I'd leave the ondisk offset as-is, but change the pagecache offset to
> > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep
> > file data and fsverity metadata completely separate.
>
> Can we find a way to do that in common code and make ext4 and f2fs do
> the same?
hmm I don't see what else we could do except providing common offset
and then use it to map blocks
loff_t fsverity_metadata_offset(struct inode *inode)
{
return roundup(i_size_read(), mapping_max_folio_size_supported());
}
--
- Andrey
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-20 11:44 ` Andrey Albershteyn
@ 2026-01-20 17:34 ` Darrick J. Wong
2026-01-21 15:03 ` Christoph Hellwig
1 sibling, 0 replies; 86+ messages in thread
From: Darrick J. Wong @ 2026-01-20 17:34 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, Eric Biggers, Matthew Wilcox, fsverity,
linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4,
jaegeuk, chao, linux-f2fs-devel
On Tue, Jan 20, 2026 at 12:44:19PM +0100, Andrey Albershteyn wrote:
> On 2026-01-20 08:32:18, Christoph Hellwig wrote:
> > On Mon, Jan 19, 2026 at 11:58:16AM -0800, Darrick J. Wong wrote:
> > > > > a) not all architectures are reasonable. As Darrick pointed out
> > > > > hexagon seems to support page size up to 1MiB. While I don't know
> > > > > if they exist in real life, powerpc supports up to 256kiB pages,
> > > > > and I know they are used for real in various embedded settings
> > >
> > > They *did* way back in the day, I worked with some seekrit PPC440s early
> > > in my career. I don't know that any of them still exist, but the code
> > > is still there...
> >
> > Sorry, I meant I don't really know how real the hexagon large page
> > sizes are. I know about the ppcs one personally, too.
> >
> > > > If we do need to fix this, there are a couple things we could consider
> > > > doing without changing the on-disk format in ext4 or f2fs: putting the
> > > > data in the page cache at a different offset than it exists on-disk, or
> > > > using "small" pages for EOF specifically.
> > >
> > > I'd leave the ondisk offset as-is, but change the pagecache offset to
> > > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep
> > > file data and fsverity metadata completely separate.
> >
> > Can we find a way to do that in common code and make ext4 and f2fs do
> > the same?
>
> hmm I don't see what else we could do except providing common offset
> and then use it to map blocks
>
> loff_t fsverity_metadata_offset(struct inode *inode)
> {
> return roundup(i_size_read(), mapping_max_folio_size_supported());
> }
Yeah, that's probably the best we can do. Please add a comment to that
helper to state explicitly that this is the *incore* file offset of the
merkle tree if the filesystem decides to cache it in the pagecache.
--D
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-20 11:44 ` Andrey Albershteyn
2026-01-20 17:34 ` Darrick J. Wong
@ 2026-01-21 15:03 ` Christoph Hellwig
1 sibling, 0 replies; 86+ messages in thread
From: Christoph Hellwig @ 2026-01-21 15:03 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, Darrick J. Wong, Eric Biggers, Matthew Wilcox,
fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso,
linux-ext4, jaegeuk, chao, linux-f2fs-devel
On Tue, Jan 20, 2026 at 12:44:19PM +0100, Andrey Albershteyn wrote:
> > > I'd leave the ondisk offset as-is, but change the pagecache offset to
> > > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep
> > > file data and fsverity metadata completely separate.
> >
> > Can we find a way to do that in common code and make ext4 and f2fs do
> > the same?
>
> hmm I don't see what else we could do except providing common offset
> and then use it to map blocks
>
> loff_t fsverity_metadata_offset(struct inode *inode)
> {
> return roundup(i_size_read(), mapping_max_folio_size_supported());
> }
Something like that, yes.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree
2026-01-19 19:32 ` Eric Biggers
2026-01-19 19:58 ` Darrick J. Wong
@ 2026-01-19 20:00 ` Matthew Wilcox
1 sibling, 0 replies; 86+ messages in thread
From: Matthew Wilcox @ 2026-01-19 20:00 UTC (permalink / raw)
To: Eric Biggers
Cc: Christoph Hellwig, Andrey Albershteyn, Darrick J. Wong, fsverity,
linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4,
jaegeuk, chao, linux-f2fs-devel
On Mon, Jan 19, 2026 at 11:32:42AM -0800, Eric Biggers wrote:
> On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote:
> > While looking at fsverity I'd like to understand the choise of offset
> > in ext4 and f2fs, and wonder about an issue.
> >
> > Both ext4 and f2fs round up the inode size to the next 64k boundary
> > and place the metadata there. Both use the 65536 magic number for that
> > instead of a well documented constant unfortunately.
> >
> > I assume this was picked to align up to the largest reasonable page
> > size? Unfortunately for that:
> >
> > a) not all architectures are reasonable. As Darrick pointed out
> > hexagon seems to support page size up to 1MiB. While I don't know
> > if they exist in real life, powerpc supports up to 256kiB pages,
> > and I know they are used for real in various embedded settings
> > b) with large folio support in the page cache, the folios used to
> > map files can be much larger than the base page size, with all
> > the same issues as a larger page size
> >
> > So assuming that fsverity is trying to avoid the issue of a page/folio
> > that covers both data and fsverity metadata, how does it copy with that?
> > Do we need to disable fsverity on > 64k page size and disable large
> > folios on fsverity files? The latter would mean writing back all cached
> > data first as well.
> >
> > And going forward, should we have a v2 format that fixes this? For that
> > we'd still need a maximum folio size of course. And of course I'd like
> > to get all these things right from the start in XFS, while still being as
> > similar as possible to ext4/f2fs.
>
> Yes, if I recall correctly it was intended to be the "largest reasonable
> page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed
> we should disable fsverity support in that configuration.
I don't think anybody will weep for lack of fsverity support in these
weirdo large PAGE_SIZE configurations.
> I don't think large folios are quite as problematic.
> ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a
> folio and return the appropriate page in it, and fs/verity/verify.c
> operates on the page. If it's a page in the folio that spans EOF, I
> think everything will actually still work, except userspace will be able
> to see Merkle tree data after a 64K boundary past EOF if the file is
> mmapped using huge pages.
>
> The mmap issue isn't great, but I'm not sure how much it matters,
> especially when the zeroes do still go up to a 64K boundary.
We actually refuse to map pages after EOF. See filemap_map_pages()
if ((file_end >= folio_next_index(folio) || shmem_mapping(mapping)) &&
filemap_map_pmd(vmf, folio, start_pgoff)) {
ret = VM_FAULT_NOPAGE;
goto out;
}
along with the other treatment of end_pgoff.
^ permalink raw reply [flat|nested] 86+ messages in thread