* [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree
@ 2025-07-28 20:30 Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF Andrey Albershteyn
` (28 more replies)
0 siblings, 29 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
Hi all,
This patchset adds fs-verity support for XFS. This version store merkle
tree beyond end of the file, similar as ext4 does it.
The first two patches introduce new iomap_read/write interface in iomap.
The reasons are:
- it is not bound by EOF,
- the iomap_read_region() also allocates folio and returns it to caller.
Then follows changes to the fs-verity core, per-filesystem workqueue,
iomap integration. These are mostly unchanged from previous patchsets.
The iomap read path has a bit of a fs-verity only zeroing logic for the
case when tree block size, fs block size and page size differ. As tree is
contiguous region of memory I just zero the tail of the tree region.
Preallocations. I just disabled preallocations by setting allocation
size to zero for Merkle tree data. This should not be a problem as these
files are read-only and in stable state when we get to Merkle tree
writing. It would be nice to allocate tree size on first write, but I
haven't got to it yet.
The tree is read by iomap into page cache at offset 1 << 53. This seems
to be far enough to handle any supported file size.
Testing. The -g verity is passing for 1k and 4k with/without quota, the
tests include different merkle tree block size.
I plan to look into readahead and whole tree allocation on first write
and xfsprogs requires a bit more work.
Feedback is welcomed :)
xfsprogs:
https://github.com/alberand/xfsprogs/tree/b4/fsverity
xfstests:
https://github.com/alberand/xfstests/tree/b4/fsverity
Cc: fsverity@lists.linux.dev
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: david@fromorbit.com
Cc: djwong@kernel.org
Cc: ebiggers@kernel.org
Cc: hch@lst.de
[RFC] Directly mapped xattr data & fs-verity
[1]: https://lore.kernel.org/linux-xfs/20241229133350.1192387-1-aalbersh@kernel.org/
---
Andrey Albershteyn (19):
iomap: add iomap_writepages_unbound() to write beyond EOF
iomap: introduce iomap_read/write_region interface
fs: add FS_XFLAG_VERITY for verity files
fsverity: add per-sb workqueue for post read processing
fsverity: add tracepoints
iomap: integrate fs-verity verification into iomap's read path
xfs: add attribute type for fs-verity
xfs: add fs-verity ro-compat flag
xfs: add inode on-disk VERITY flag
xfs: initialize fs-verity on file open and cleanup on inode destruction
xfs: don't allow to enable DAX on fs-verity sealed inode
xfs: disable direct read path for fs-verity files
xfs: disable preallocations for fsverity Merkle tree writes
xfs: add writeback and iomap reading of Merkel tree pages
xfs: add fs-verity support
xfs: add fs-verity ioctls
xfs: fix scrub trace with null pointer in quotacheck
xfs: add fsverity traces
xfs: enable ro-compat fs-verity flag
Darrick J. Wong (10):
fsverity: report validation errors back to the filesystem
fsverity: pass super_block to fsverity_enqueue_verify_work
ext4: use a per-superblock fsverity workqueue
f2fs: use a per-superblock fsverity workqueue
btrfs: use a per-superblock fsverity workqueue
fsverity: remove system-wide workqueue
fsverity: expose merkle tree geometry to callers
xfs: advertise fs-verity being available on filesystem
xfs: check and repair the verity inode flag state
xfs: report verity failures through the health system
Documentation/filesystems/fsverity.rst | 8 +
MAINTAINERS | 1 +
fs/btrfs/super.c | 14 ++
fs/buffer.c | 7 +-
fs/ext4/readpage.c | 4 +-
fs/ext4/super.c | 11 ++
fs/f2fs/compress.c | 3 +-
fs/f2fs/data.c | 2 +-
fs/f2fs/super.c | 11 ++
fs/ioctl.c | 11 ++
fs/iomap/buffered-io.c | 301 ++++++++++++++++++++++++++++--
fs/iomap/ioend.c | 41 +++-
fs/super.c | 3 +
fs/verity/enable.c | 4 +
fs/verity/fsverity_private.h | 2 +-
fs/verity/init.c | 2 +-
fs/verity/open.c | 37 ++++
fs/verity/verify.c | 52 +++---
fs/xfs/Makefile | 1 +
fs/xfs/libxfs/xfs_da_format.h | 15 +-
fs/xfs/libxfs/xfs_format.h | 13 +-
fs/xfs/libxfs/xfs_fs.h | 2 +
fs/xfs/libxfs/xfs_health.h | 4 +-
fs/xfs/libxfs/xfs_inode_buf.c | 8 +
fs/xfs/libxfs/xfs_inode_util.c | 2 +
fs/xfs/libxfs/xfs_log_format.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 4 +
fs/xfs/scrub/attr.c | 7 +
fs/xfs/scrub/common.c | 74 ++++++++
fs/xfs/scrub/common.h | 3 +
fs/xfs/scrub/inode.c | 7 +
fs/xfs/scrub/inode_repair.c | 36 ++++
fs/xfs/scrub/trace.h | 2 +-
fs/xfs/xfs_aops.c | 21 ++-
fs/xfs/xfs_bmap_util.c | 7 +
fs/xfs/xfs_file.c | 23 ++-
fs/xfs/xfs_fsverity.c | 330 +++++++++++++++++++++++++++++++++
fs/xfs/xfs_fsverity.h | 28 +++
fs/xfs/xfs_health.c | 1 +
fs/xfs/xfs_inode.h | 6 +
fs/xfs/xfs_ioctl.c | 16 ++
fs/xfs/xfs_iomap.c | 22 ++-
fs/xfs/xfs_iops.c | 4 +
fs/xfs/xfs_mount.h | 2 +
fs/xfs/xfs_super.c | 22 +++
fs/xfs/xfs_trace.h | 49 ++++-
include/linux/fs.h | 2 +
include/linux/fsverity.h | 49 ++++-
include/linux/iomap.h | 32 ++++
include/trace/events/fsverity.h | 162 ++++++++++++++++
include/uapi/linux/fs.h | 1 +
51 files changed, 1399 insertions(+), 71 deletions(-)
---
base-commit: 305d79226a6a797b193ca681e9f26f3bf081397b
change-id: 20250212-fsverity-eb66cef7fe9b
Best regards,
--
Andrey Albershteyn <aalbersh@kernel.org>
^ permalink raw reply [flat|nested] 75+ messages in thread
* [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 22:07 ` Darrick J. Wong
2025-07-31 18:43 ` Joanne Koong
2025-07-28 20:30 ` [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface Andrey Albershteyn
` (27 subsequent siblings)
28 siblings, 2 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
Add iomap_writepages_unbound() without limit in form of EOF. XFS
will use this to write metadata (fs-verity Merkle tree) in range far
beyond EOF.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/iomap/buffered-io.c | 51 +++++++++++++++++++++++++++++++++++++++-----------
include/linux/iomap.h | 3 +++
2 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3729391a18f3..7bef232254a3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1881,18 +1881,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
int error = 0;
u32 rlen;
- WARN_ON_ONCE(!folio_test_locked(folio));
- WARN_ON_ONCE(folio_test_dirty(folio));
- WARN_ON_ONCE(folio_test_writeback(folio));
-
- trace_iomap_writepage(inode, pos, folio_size(folio));
-
- if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
- folio_unlock(folio);
- return 0;
- }
WARN_ON_ONCE(end_pos <= pos);
+ trace_iomap_writepage(inode, pos, folio_size(folio));
+
if (i_blocks_per_folio(inode, folio) > 1) {
if (!ifs) {
ifs = ifs_alloc(inode, folio, 0);
@@ -1956,6 +1948,23 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
return error;
}
+/* Map pages bound by EOF */
+static int iomap_writepage_map_eof(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio)
+{
+ int error;
+ struct inode *inode = folio->mapping->host;
+ u64 end_pos = folio_pos(folio) + folio_size(folio);
+
+ if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+ folio_unlock(folio);
+ return 0;
+ }
+
+ error = iomap_writepage_map(wpc, wbc, folio);
+ return error;
+}
+
int
iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
struct iomap_writepage_ctx *wpc,
@@ -1972,9 +1981,29 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
PF_MEMALLOC))
return -EIO;
+ wpc->ops = ops;
+ while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
+ WARN_ON_ONCE(!folio_test_locked(folio));
+ WARN_ON_ONCE(folio_test_dirty(folio));
+ WARN_ON_ONCE(folio_test_writeback(folio));
+
+ error = iomap_writepage_map_eof(wpc, wbc, folio);
+ }
+ return iomap_submit_ioend(wpc, error);
+}
+EXPORT_SYMBOL_GPL(iomap_writepages);
+
+int
+iomap_writepages_unbound(struct address_space *mapping, struct writeback_control *wbc,
+ struct iomap_writepage_ctx *wpc,
+ const struct iomap_writeback_ops *ops)
+{
+ struct folio *folio = NULL;
+ int error;
+
wpc->ops = ops;
while ((folio = writeback_iter(mapping, wbc, folio, &error)))
error = iomap_writepage_map(wpc, wbc, folio);
return iomap_submit_ioend(wpc, error);
}
-EXPORT_SYMBOL_GPL(iomap_writepages);
+EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 522644d62f30..4a0b5ebb79e9 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -464,6 +464,9 @@ void iomap_sort_ioends(struct list_head *ioend_list);
int iomap_writepages(struct address_space *mapping,
struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
const struct iomap_writeback_ops *ops);
+int iomap_writepages_unbound(struct address_space *mapping,
+ struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
+ const struct iomap_writeback_ops *ops);
/*
* Flags for direct I/O ->end_io:
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 22:22 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
` (26 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
Interface for writing data beyond EOF into offsetted region in
page cache.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/iomap/buffered-io.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/iomap.h | 16 ++++++++
2 files changed, 114 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7bef232254a3..e959a206cba9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -321,6 +321,7 @@ struct iomap_readpage_ctx {
bool cur_folio_in_bio;
struct bio *bio;
struct readahead_control *rac;
+ int flags;
};
/**
@@ -387,7 +388,8 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
if (plen == 0)
goto done;
- if (iomap_block_needs_zeroing(iter, pos)) {
+ if (iomap_block_needs_zeroing(iter, pos) &&
+ !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
folio_zero_range(folio, poff, plen);
iomap_set_range_uptodate(folio, poff, plen);
goto done;
@@ -2007,3 +2009,98 @@ iomap_writepages_unbound(struct address_space *mapping, struct writeback_control
return iomap_submit_ioend(wpc, error);
}
EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
+
+struct folio *
+iomap_read_region(struct ioregion *region)
+{
+ struct inode *inode = region->inode;
+ fgf_t fgp = FGP_CREAT | FGP_LOCK | fgf_set_order(region->length);
+ pgoff_t index = (region->pos) >> PAGE_SHIFT;
+ struct folio *folio = __filemap_get_folio(inode->i_mapping, index, fgp,
+ mapping_gfp_mask(inode->i_mapping));
+ int ret;
+ struct iomap_iter iter = {
+ .inode = folio->mapping->host,
+ .pos = region->pos,
+ .len = region->length,
+ };
+ struct iomap_readpage_ctx ctx = {
+ .cur_folio = folio,
+ };
+
+ if (folio_test_uptodate(folio)) {
+ folio_unlock(folio);
+ return folio;
+ }
+
+ while ((ret = iomap_iter(&iter, region->ops)) > 0)
+ iter.status = iomap_read_folio_iter(&iter, &ctx);
+
+ if (ctx.bio) {
+ submit_bio(ctx.bio);
+ WARN_ON_ONCE(!ctx.cur_folio_in_bio);
+ } else {
+ WARN_ON_ONCE(ctx.cur_folio_in_bio);
+ folio_unlock(folio);
+ }
+
+ return folio;
+}
+EXPORT_SYMBOL_GPL(iomap_read_region);
+
+static int iomap_write_region_iter(struct iomap_iter *iter, const void *buf)
+{
+ loff_t pos = iter->pos;
+ u64 bytes = iomap_length(iter);
+ int status;
+
+ do {
+ struct folio *folio;
+ size_t offset;
+ bool ret;
+
+ bytes = min_t(u64, SIZE_MAX, bytes);
+ status = iomap_write_begin(iter, &folio, &offset, &bytes);
+ if (status)
+ return status;
+ if (iter->iomap.flags & IOMAP_F_STALE)
+ break;
+
+ offset = offset_in_folio(folio, pos);
+ if (bytes > folio_size(folio) - offset)
+ bytes = folio_size(folio) - offset;
+
+ memcpy_to_folio(folio, offset, buf, bytes);
+
+ ret = iomap_write_end(iter, bytes, bytes, folio);
+ if (WARN_ON_ONCE(!ret))
+ return -EIO;
+
+ __iomap_put_folio(iter, bytes, folio);
+ if (WARN_ON_ONCE(!ret))
+ return -EIO;
+
+ status = iomap_iter_advance(iter, &bytes);
+ if (status)
+ break;
+ } while (bytes > 0);
+
+ return status;
+}
+
+int
+iomap_write_region(struct ioregion *region)
+{
+ struct iomap_iter iter = {
+ .inode = region->inode,
+ .pos = region->pos,
+ .len = region->length,
+ };
+ ssize_t ret;
+
+ while ((ret = iomap_iter(&iter, region->ops)) > 0)
+ iter.status = iomap_write_region_iter(&iter, region->buf);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(iomap_write_region);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4a0b5ebb79e9..73288f28543f 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -83,6 +83,11 @@ struct vm_fault;
*/
#define IOMAP_F_PRIVATE (1U << 12)
+/*
+ * Writes happens beyound inode EOF
+ */
+#define IOMAP_F_BEYOND_EOF (1U << 13)
+
/*
* Flags set by the core iomap code during operations:
*
@@ -533,4 +538,15 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
extern struct bio_set iomap_ioend_bioset;
+struct ioregion {
+ struct inode *inode;
+ loff_t pos; /* IO position */
+ const void *buf; /* Data to be written (in only) */
+ size_t length; /* Length of the date */
+ const struct iomap_ops *ops;
+};
+
+struct folio *iomap_read_region(struct ioregion *region);
+int iomap_write_region(struct ioregion *region);
+
#endif /* LINUX_IOMAP_H */
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 9:53 ` Amir Goldstein
2025-08-12 7:51 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing Andrey Albershteyn
` (25 subsequent siblings)
28 siblings, 2 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
enabled.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: fix broken verity flag checks]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
Documentation/filesystems/fsverity.rst | 8 ++++++++
fs/ioctl.c | 11 +++++++++++
include/uapi/linux/fs.h | 1 +
3 files changed, 20 insertions(+)
diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index dacdbc1149e6..33b588c32ed1 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -342,6 +342,14 @@ the file has fs-verity enabled. This can perform better than
FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
opening the file, and opening verity files can be expensive.
+FS_IOC_FSGETXATTR
+-----------------
+
+Since Linux v6.17, the FS_IOC_FSGETXATTR ioctl sets FS_XFLAG_VERITY (0x00020000)
+in the returned flags when the file has verity enabled. Note that this attribute
+cannot be set with FS_IOC_FSSETXATTR as enabling verity requires input
+parameters. See FS_IOC_ENABLE_VERITY.
+
.. _accessing_verity_files:
Accessing verity files
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 69107a245b4c..6b94da2b93f5 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -480,6 +480,8 @@ void fileattr_fill_xflags(struct fileattr *fa, u32 xflags)
fa->flags |= FS_DAX_FL;
if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
fa->flags |= FS_PROJINHERIT_FL;
+ if (fa->fsx_xflags & FS_XFLAG_VERITY)
+ fa->flags |= FS_VERITY_FL;
}
EXPORT_SYMBOL(fileattr_fill_xflags);
@@ -510,6 +512,8 @@ void fileattr_fill_flags(struct fileattr *fa, u32 flags)
fa->fsx_xflags |= FS_XFLAG_DAX;
if (fa->flags & FS_PROJINHERIT_FL)
fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
+ if (fa->flags & FS_VERITY_FL)
+ fa->fsx_xflags |= FS_XFLAG_VERITY;
}
EXPORT_SYMBOL(fileattr_fill_flags);
@@ -640,6 +644,13 @@ static int fileattr_set_prepare(struct inode *inode,
!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
return -EINVAL;
+ /*
+ * Verity cannot be changed through FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS.
+ * See FS_IOC_ENABLE_VERITY.
+ */
+ if ((fa->fsx_xflags ^ old_ma->fsx_xflags) & FS_XFLAG_VERITY)
+ return -EINVAL;
+
/* Extent size hints of zero turn off the flags. */
if (fa->fsx_extsize == 0)
fa->fsx_xflags &= ~(FS_XFLAG_EXTSIZE | FS_XFLAG_EXTSZINHERIT);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 0098b0ce8ccb..20777ec55f7b 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -167,6 +167,7 @@ struct fsxattr {
#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
#define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
#define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
+#define FS_XFLAG_VERITY 0x00020000 /* fs-verity enabled */
#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/* the read-only stuff doesn't really belong here, but any other place is
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (2 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-08-11 11:45 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 05/29] fsverity: add tracepoints Andrey Albershteyn
` (24 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
For XFS, fsverity's global workqueue is not really suitable due to:
1. High priority workqueues are used within XFS to ensure that data
IO completion cannot stall processing of journal IO completions.
Hence using a WQ_HIGHPRI workqueue directly in the user data IO
path is a potential filesystem livelock/deadlock vector.
2. The fsverity workqueue is global - it creates a cross-filesystem
contention point.
This patch adds per-filesystem, per-cpu workqueue for fsverity
work. This allows iomap to add verification work in the read path on
BIO completion.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: make it clearer that this workqueue is for verity]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/super.c | 3 +++
fs/verity/verify.c | 14 ++++++++++++++
include/linux/fs.h | 2 ++
include/linux/fsverity.h | 18 ++++++++++++++++++
4 files changed, 37 insertions(+)
diff --git a/fs/super.c b/fs/super.c
index 80418ca8e215..f80fe7395228 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -37,6 +37,7 @@
#include <linux/user_namespace.h>
#include <linux/fs_context.h>
#include <uapi/linux/mount.h>
+#include <linux/fsverity.h>
#include "internal.h"
static int thaw_super_locked(struct super_block *sb, enum freeze_holder who,
@@ -639,6 +640,8 @@ void generic_shutdown_super(struct super_block *sb)
sb->s_dio_done_wq = NULL;
}
+ fsverity_destroy_wq(sb);
+
if (sop->put_super)
sop->put_super(sb);
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 4fcad0825a12..30a3f6ada2ad 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -334,6 +334,20 @@ void fsverity_verify_bio(struct bio *bio)
EXPORT_SYMBOL_GPL(fsverity_verify_bio);
#endif /* CONFIG_BLOCK */
+int fsverity_init_wq(struct super_block *sb, unsigned int wq_flags,
+ int max_active)
+{
+ WARN_ON_ONCE(sb->s_verity_wq != NULL);
+
+ sb->s_verity_wq = alloc_workqueue("fsverity/%s", wq_flags, max_active,
+ sb->s_id);
+ if (!sb->s_verity_wq)
+ return -ENOMEM;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_init_wq);
+
/**
* fsverity_enqueue_verify_work() - enqueue work on the fs-verity workqueue
* @work: the work to enqueue
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 040c0036320f..abe31e9959fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1350,6 +1350,8 @@ struct super_block {
#endif
#ifdef CONFIG_FS_VERITY
const struct fsverity_operations *s_vop;
+ /* Completion queue for post read verification */
+ struct workqueue_struct *s_verity_wq;
#endif
#if IS_ENABLED(CONFIG_UNICODE)
struct unicode_map *s_encoding;
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 1eb7eae580be..9b91bd54fb75 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -174,6 +174,17 @@ bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
void fsverity_verify_bio(struct bio *bio);
void fsverity_enqueue_verify_work(struct work_struct *work);
+int fsverity_init_wq(struct super_block *sb, unsigned int wq_flags,
+ int max_active);
+
+static inline void fsverity_destroy_wq(struct super_block *sb)
+{
+ if (sb->s_verity_wq) {
+ destroy_workqueue(sb->s_verity_wq);
+ sb->s_verity_wq = NULL;
+ }
+}
+
#else /* !CONFIG_FS_VERITY */
static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
@@ -251,6 +262,13 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
WARN_ON_ONCE(1);
}
+static inline int fsverity_init_wq(struct super_block *sb)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void fsverity_destroy_wq(struct super_block *sb) { }
+
#endif /* !CONFIG_FS_VERITY */
static inline bool fsverity_verify_folio(struct folio *folio)
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 05/29] fsverity: add tracepoints
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (3 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem Andrey Albershteyn
` (23 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
fs-verity previously had debug printk but it was removed. This patch
adds trace points to the same places where printk were used (with a
few additional ones).
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix formatting]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
MAINTAINERS | 1 +
fs/verity/enable.c | 4 ++
fs/verity/fsverity_private.h | 2 +
fs/verity/init.c | 1 +
fs/verity/verify.c | 9 +++
include/trace/events/fsverity.h | 143 ++++++++++++++++++++++++++++++++++++++++
6 files changed, 160 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 60bba48f5479..64575d2007f2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9851,6 +9851,7 @@ T: git https://git.kernel.org/pub/scm/fs/fsverity/linux.git
F: Documentation/filesystems/fsverity.rst
F: fs/verity/
F: include/linux/fsverity.h
+F: include/trace/events/fsverity.h
F: include/uapi/linux/fsverity.h
FT260 FTDI USB-HID TO I2C BRIDGE DRIVER
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index c284f46d1b53..7cf902051b2d 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -227,6 +227,8 @@ static int enable_verity(struct file *filp,
if (err)
goto out;
+ trace_fsverity_enable(inode, ¶ms);
+
/*
* Start enabling verity on this file, serialized by the inode lock.
* Fail if verity is already enabled or is already being enabled.
@@ -269,6 +271,8 @@ static int enable_verity(struct file *filp,
goto rollback;
}
+ trace_fsverity_tree_done(inode, vi, ¶ms);
+
/*
* Tell the filesystem to finish enabling verity on the file.
* Serialized with ->begin_enable_verity() by the inode lock.
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index b3506f56e180..04dd471d791c 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -154,4 +154,6 @@ static inline void fsverity_init_signature(void)
void __init fsverity_init_workqueue(void);
+#include <trace/events/fsverity.h>
+
#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index 6e8d33b50240..d65206608583 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -5,6 +5,7 @@
* Copyright 2019 Google LLC
*/
+#define CREATE_TRACE_POINTS
#include "fsverity_private.h"
#include <linux/ratelimit.h>
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 30a3f6ada2ad..580486168467 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -109,6 +109,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
/* Byte offset of the wanted hash relative to @addr */
unsigned int hoffset;
} hblocks[FS_VERITY_MAX_LEVELS];
+
+ trace_fsverity_verify_data_block(inode, params, data_pos);
+
/*
* The index of the previous level's block within that level; also the
* index of that block's hash within the current level.
@@ -184,6 +187,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
want_hash = _want_hash;
kunmap_local(haddr);
put_page(hpage);
+ trace_fsverity_merkle_hit(inode, data_pos, hblock_idx,
+ level,
+ hoffset >> params->log_digestsize);
goto descend;
}
hblocks[level].page = hpage;
@@ -219,6 +225,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
want_hash = _want_hash;
kunmap_local(haddr);
put_page(hpage);
+ trace_fsverity_verify_merkle_block(inode,
+ hblock_idx << params->log_blocksize,
+ level, hoffset >> params->log_digestsize);
}
/* Finally, verify the data block. */
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
new file mode 100644
index 000000000000..dab220884b89
--- /dev/null
+++ b/include/trace/events/fsverity.h
@@ -0,0 +1,143 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fsverity
+
+#if !defined(_TRACE_FSVERITY_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FSVERITY_H
+
+#include <linux/tracepoint.h>
+
+struct fsverity_descriptor;
+struct merkle_tree_params;
+struct fsverity_info;
+
+TRACE_EVENT(fsverity_enable,
+ TP_PROTO(const struct inode *inode,
+ const struct merkle_tree_params *params),
+ TP_ARGS(inode, params),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(u64, data_size)
+ __field(unsigned int, block_size)
+ __field(unsigned int, num_levels)
+ __field(u64, tree_size)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->data_size = i_size_read(inode);
+ __entry->block_size = params->block_size;
+ __entry->num_levels = params->num_levels;
+ __entry->tree_size = params->tree_size;
+ ),
+ TP_printk("ino %lu data size %llu tree size %llu block size %u levels %u",
+ (unsigned long) __entry->ino,
+ __entry->data_size,
+ __entry->tree_size,
+ __entry->block_size,
+ __entry->num_levels)
+);
+
+TRACE_EVENT(fsverity_tree_done,
+ TP_PROTO(const struct inode *inode, const struct fsverity_info *vi,
+ const struct merkle_tree_params *params),
+ TP_ARGS(inode, vi, params),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(unsigned int, levels)
+ __field(unsigned int, block_size)
+ __field(u64, tree_size)
+ __dynamic_array(u8, root_hash, params->digest_size)
+ __dynamic_array(u8, file_digest, params->digest_size)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->levels = params->num_levels;
+ __entry->block_size = params->block_size;
+ __entry->tree_size = params->tree_size;
+ memcpy(__get_dynamic_array(root_hash), vi->root_hash, __get_dynamic_array_len(root_hash));
+ memcpy(__get_dynamic_array(file_digest), vi->file_digest, __get_dynamic_array_len(file_digest));
+ ),
+ TP_printk("ino %lu levels %d block_size %d tree_size %lld root_hash %s digest %s",
+ (unsigned long) __entry->ino,
+ __entry->levels,
+ __entry->block_size,
+ __entry->tree_size,
+ __print_hex_str(__get_dynamic_array(root_hash), __get_dynamic_array_len(root_hash)),
+ __print_hex_str(__get_dynamic_array(file_digest), __get_dynamic_array_len(file_digest)))
+);
+
+TRACE_EVENT(fsverity_verify_data_block,
+ TP_PROTO(const struct inode *inode,
+ const struct merkle_tree_params *params,
+ u64 data_pos),
+ TP_ARGS(inode, params, data_pos),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(u64, data_pos)
+ __field(unsigned int, block_size)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->data_pos = data_pos;
+ __entry->block_size = params->block_size;
+ ),
+ TP_printk("ino %lu pos %lld merkle_blocksize %u",
+ (unsigned long) __entry->ino,
+ __entry->data_pos,
+ __entry->block_size)
+);
+
+TRACE_EVENT(fsverity_merkle_hit,
+ TP_PROTO(const struct inode *inode, u64 data_pos,
+ unsigned long hblock_idx, unsigned int level,
+ unsigned int hidx),
+ TP_ARGS(inode, data_pos, hblock_idx, level, hidx),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(u64, data_pos)
+ __field(unsigned long, hblock_idx)
+ __field(unsigned int, level)
+ __field(unsigned int, hidx)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->data_pos = data_pos;
+ __entry->hblock_idx = hblock_idx;
+ __entry->level = level;
+ __entry->hidx = hidx;
+ ),
+ TP_printk("ino %lu data_pos %llu hblock_idx %lu level %u hidx %u",
+ (unsigned long) __entry->ino,
+ __entry->data_pos,
+ __entry->hblock_idx,
+ __entry->level,
+ __entry->hidx)
+);
+
+TRACE_EVENT(fsverity_verify_merkle_block,
+ TP_PROTO(const struct inode *inode, unsigned long index,
+ unsigned int level, unsigned int hidx),
+ TP_ARGS(inode, index, level, hidx),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(unsigned long, index)
+ __field(unsigned int, level)
+ __field(unsigned int, hidx)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->index = index;
+ __entry->level = level;
+ __entry->hidx = hidx;
+ ),
+ TP_printk("ino %lu index %lu level %u hidx %u",
+ (unsigned long) __entry->ino,
+ __entry->index,
+ __entry->level,
+ __entry->hidx)
+);
+
+#endif /* _TRACE_FSVERITY_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (4 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 05/29] fsverity: add tracepoints Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-08-11 11:46 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 07/29] fsverity: pass super_block to fsverity_enqueue_verify_work Andrey Albershteyn
` (22 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
Provide a new function call so that validation errors can be reported
back to the filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/verity/verify.c | 4 ++++
include/linux/fsverity.h | 14 ++++++++++++++
include/trace/events/fsverity.h | 19 +++++++++++++++++++
3 files changed, 37 insertions(+)
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 580486168467..917a5fb5388f 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -243,6 +243,10 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
data_pos, level - 1,
params->hash_alg->name, hsize, want_hash,
params->hash_alg->name, hsize, real_hash);
+ trace_fsverity_file_corrupt(inode, data_pos, params->block_size);
+ if (inode->i_sb->s_vop->file_corrupt)
+ inode->i_sb->s_vop->file_corrupt(inode, data_pos,
+ params->block_size);
error:
for (; level > 0; level--) {
kunmap_local(hblocks[level - 1].addr);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 9b91bd54fb75..d263e988bec2 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -120,6 +120,20 @@ struct fsverity_operations {
*/
int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
u64 pos, unsigned int size);
+
+ /**
+ * Notify the filesystem that file data is corrupt.
+ *
+ * @inode: the inode being validated
+ * @pos: the file position of the invalid data
+ * @len: the length of the invalid data
+ *
+ * This function is called when fs-verity detects that a portion of a
+ * file's data is inconsistent with the Merkle tree, or a Merkle tree
+ * block needed to validate the data is inconsistent with the level
+ * above it.
+ */
+ void (*file_corrupt)(struct inode *inode, loff_t pos, size_t len);
};
#ifdef CONFIG_FS_VERITY
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
index dab220884b89..375fdddac6a9 100644
--- a/include/trace/events/fsverity.h
+++ b/include/trace/events/fsverity.h
@@ -137,6 +137,25 @@ TRACE_EVENT(fsverity_verify_merkle_block,
__entry->hidx)
);
+TRACE_EVENT(fsverity_file_corrupt,
+ TP_PROTO(const struct inode *inode, loff_t pos, size_t len),
+ TP_ARGS(inode, pos, len),
+ TP_STRUCT__entry(
+ __field(ino_t, ino)
+ __field(loff_t, pos)
+ __field(size_t, len)
+ ),
+ TP_fast_assign(
+ __entry->ino = inode->i_ino;
+ __entry->pos = pos;
+ __entry->len = len;
+ ),
+ TP_printk("ino %lu pos %llu len %zu",
+ (unsigned long) __entry->ino,
+ __entry->pos,
+ __entry->len)
+);
+
#endif /* _TRACE_FSVERITY_H */
/* This part must be outside protection */
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 07/29] fsverity: pass super_block to fsverity_enqueue_verify_work
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (5 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 08/29] ext4: use a per-superblock fsverity workqueue Andrey Albershteyn
` (21 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
In preparation for having per-superblock fsverity workqueues, pass the
super_block object to fsverity_enqueue_verify_work.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/buffer.c | 7 +++++--
fs/ext4/readpage.c | 4 +++-
fs/f2fs/compress.c | 3 ++-
fs/f2fs/data.c | 2 +-
fs/verity/verify.c | 6 ++++--
include/linux/fsverity.h | 6 ++++--
6 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 8cf4a1dc481e..9d56eb353ced 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -336,13 +336,15 @@ static void decrypt_bh(struct work_struct *work)
err = fscrypt_decrypt_pagecache_blocks(bh->b_folio, bh->b_size,
bh_offset(bh));
if (err == 0 && need_fsverity(bh)) {
+ struct super_block *sb = bh->b_folio->mapping->host->i_sb;
+
/*
* We use different work queues for decryption and for verity
* because verity may require reading metadata pages that need
* decryption, and we shouldn't recurse to the same workqueue.
*/
INIT_WORK(&ctx->work, verify_bh);
- fsverity_enqueue_verify_work(&ctx->work);
+ fsverity_enqueue_verify_work(sb, &ctx->work);
return;
}
end_buffer_async_read(bh, err == 0);
@@ -371,7 +373,8 @@ static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
fscrypt_enqueue_decrypt_work(&ctx->work);
} else {
INIT_WORK(&ctx->work, verify_bh);
- fsverity_enqueue_verify_work(&ctx->work);
+ fsverity_enqueue_verify_work(inode->i_sb,
+ &ctx->work);
}
return;
}
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index f329daf6e5c7..e9ecfc7830d7 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -61,6 +61,7 @@ enum bio_post_read_step {
struct bio_post_read_ctx {
struct bio *bio;
+ struct super_block *sb;
struct work_struct work;
unsigned int cur_step;
unsigned int enabled_steps;
@@ -132,7 +133,7 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
case STEP_VERITY:
if (ctx->enabled_steps & (1 << STEP_VERITY)) {
INIT_WORK(&ctx->work, verity_work);
- fsverity_enqueue_verify_work(&ctx->work);
+ fsverity_enqueue_verify_work(ctx->sb, &ctx->work);
return;
}
ctx->cur_step++;
@@ -195,6 +196,7 @@ static void ext4_set_bio_post_read_ctx(struct bio *bio,
mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
ctx->bio = bio;
+ ctx->sb = inode->i_sb;
ctx->enabled_steps = post_read_steps;
bio->bi_private = ctx;
}
diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index b3c1df93a163..31b45abb26b7 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1839,7 +1839,8 @@ void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed,
* file, and these metadata pages may be compressed.
*/
INIT_WORK(&dic->verity_work, f2fs_verify_cluster);
- fsverity_enqueue_verify_work(&dic->verity_work);
+ fsverity_enqueue_verify_work(dic->inode->i_sb,
+ &dic->verity_work);
return;
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 31e892842625..e4e4697ad0d3 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -215,7 +215,7 @@ static void f2fs_verify_and_finish_bio(struct bio *bio, bool in_task)
if (ctx && (ctx->enabled_steps & STEP_VERITY)) {
INIT_WORK(&ctx->work, f2fs_verify_bio);
- fsverity_enqueue_verify_work(&ctx->work);
+ fsverity_enqueue_verify_work(ctx->sbi->sb, &ctx->work);
} else {
f2fs_finish_read_bio(bio, in_task);
}
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 917a5fb5388f..ef5d27039c98 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -363,13 +363,15 @@ EXPORT_SYMBOL_GPL(fsverity_init_wq);
/**
* fsverity_enqueue_verify_work() - enqueue work on the fs-verity workqueue
+ * @sb: superblock for this filesystem
* @work: the work to enqueue
*
* Enqueue verification work for asynchronous processing.
*/
-void fsverity_enqueue_verify_work(struct work_struct *work)
+void fsverity_enqueue_verify_work(struct super_block *sb,
+ struct work_struct *work)
{
- queue_work(fsverity_read_workqueue, work);
+ queue_work(sb->s_verity_wq ?: fsverity_read_workqueue, work);
}
EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index d263e988bec2..8155407a7e4c 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -186,7 +186,8 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);
bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
void fsverity_verify_bio(struct bio *bio);
-void fsverity_enqueue_verify_work(struct work_struct *work);
+void fsverity_enqueue_verify_work(struct super_block *sb,
+ struct work_struct *work);
int fsverity_init_wq(struct super_block *sb, unsigned int wq_flags,
int max_active);
@@ -271,7 +272,8 @@ static inline void fsverity_verify_bio(struct bio *bio)
WARN_ON_ONCE(1);
}
-static inline void fsverity_enqueue_verify_work(struct work_struct *work)
+static inline void fsverity_enqueue_verify_work(struct super_block *sb,
+ struct work_struct *work)
{
WARN_ON_ONCE(1);
}
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 08/29] ext4: use a per-superblock fsverity workqueue
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (6 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 07/29] fsverity: pass super_block to fsverity_enqueue_verify_work Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 09/29] f2fs: " Andrey Albershteyn
` (20 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
Switch ext4 to use a per-sb fsverity workqueue instead of a systemwide
workqueue.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/ext4/super.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c7d39da7e733..dcb9d1933c2e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5373,6 +5373,17 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
#endif
#ifdef CONFIG_FS_VERITY
sb->s_vop = &ext4_verityops;
+ /*
+ * Use a high-priority workqueue to prioritize verification work, which
+ * blocks reads from completing, over regular application tasks.
+ *
+ * For performance reasons, don't use an unbound workqueue. Using an
+ * unbound workqueue for crypto operations causes excessive scheduler
+ * latency on ARM64.
+ */
+ err = fsverity_init_wq(sb, WQ_HIGHPRI, num_online_cpus());
+ if (err)
+ goto failed_mount3a;
#endif
#ifdef CONFIG_QUOTA
sb->dq_op = &ext4_quota_operations;
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 09/29] f2fs: use a per-superblock fsverity workqueue
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (7 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 08/29] ext4: use a per-superblock fsverity workqueue Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 10/29] btrfs: " Andrey Albershteyn
` (19 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
Switch f2fs to use a per-sb fsverity workqueue instead of a systemwide
workqueue.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/f2fs/super.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index bbf1dad6843f..043e76b8efae 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4634,6 +4634,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
#endif
#ifdef CONFIG_FS_VERITY
sb->s_vop = &f2fs_verityops;
+ /*
+ * Use a high-priority workqueue to prioritize verification work, which
+ * blocks reads from completing, over regular application tasks.
+ *
+ * For performance reasons, don't use an unbound workqueue. Using an
+ * unbound workqueue for crypto operations causes excessive scheduler
+ * latency on ARM64.
+ */
+ err = fsverity_init_wq(sb, WQ_HIGHPRI, num_online_cpus());
+ if (err)
+ goto free_bio_info;
#endif
sb->s_xattr = f2fs_xattr_handlers;
sb->s_export_op = &f2fs_export_ops;
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 10/29] btrfs: use a per-superblock fsverity workqueue
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (8 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 09/29] f2fs: " Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 11/29] fsverity: remove system-wide workqueue Andrey Albershteyn
` (18 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
Switch btrfs to use a per-sb fsverity workqueue instead of a systemwide
workqueue.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/btrfs/super.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index a0c65adce1ab..cf82c260da9c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -28,6 +28,7 @@
#include <linux/btrfs.h>
#include <linux/security.h>
#include <linux/fs_parser.h>
+#include <linux/fsverity.h>
#include "messages.h"
#include "delayed-inode.h"
#include "ctree.h"
@@ -954,6 +955,19 @@ static int btrfs_fill_super(struct super_block *sb,
sb->s_export_op = &btrfs_export_ops;
#ifdef CONFIG_FS_VERITY
sb->s_vop = &btrfs_verityops;
+ /*
+ * Use a high-priority workqueue to prioritize verification work, which
+ * blocks reads from completing, over regular application tasks.
+ *
+ * For performance reasons, don't use an unbound workqueue. Using an
+ * unbound workqueue for crypto operations causes excessive scheduler
+ * latency on ARM64.
+ */
+ err = fsverity_init_wq(sb, WQ_HIGHPRI, num_online_cpus());
+ if (err) {
+ btrfs_err(fs_info, "fsverity_init_wq failed");
+ return err;
+ }
#endif
sb->s_xattr = btrfs_xattr_handlers;
sb->s_time_gran = 1;
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 11/29] fsverity: remove system-wide workqueue
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (9 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 10/29] btrfs: " Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers Andrey Albershteyn
` (17 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
Now that we've made the verity workqueue per-superblock, we don't need
the systemwide workqueue. Get rid of the old implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/verity/fsverity_private.h | 2 --
fs/verity/init.c | 1 -
fs/verity/verify.c | 21 +--------------------
3 files changed, 1 insertion(+), 23 deletions(-)
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 04dd471d791c..2de28be775ea 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -152,8 +152,6 @@ static inline void fsverity_init_signature(void)
/* verify.c */
-void __init fsverity_init_workqueue(void);
-
#include <trace/events/fsverity.h>
#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index d65206608583..b252493496df 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -61,7 +61,6 @@ static int __init fsverity_init(void)
{
fsverity_check_hash_algs();
fsverity_init_info_cache();
- fsverity_init_workqueue();
fsverity_init_sysctl();
fsverity_init_signature();
fsverity_init_bpf();
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index ef5d27039c98..0edbf514dd31 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -10,8 +10,6 @@
#include <crypto/hash.h>
#include <linux/bio.h>
-static struct workqueue_struct *fsverity_read_workqueue;
-
/*
* Returns true if the hash block with index @hblock_idx in the tree, located in
* @hpage, has already been verified.
@@ -371,23 +369,6 @@ EXPORT_SYMBOL_GPL(fsverity_init_wq);
void fsverity_enqueue_verify_work(struct super_block *sb,
struct work_struct *work)
{
- queue_work(sb->s_verity_wq ?: fsverity_read_workqueue, work);
+ queue_work(sb->s_verity_wq, work);
}
EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
-
-void __init fsverity_init_workqueue(void)
-{
- /*
- * Use a high-priority workqueue to prioritize verification work, which
- * blocks reads from completing, over regular application tasks.
- *
- * For performance reasons, don't use an unbound workqueue. Using an
- * unbound workqueue for crypto operations causes excessive scheduler
- * latency on ARM64.
- */
- fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
- WQ_HIGHPRI,
- num_online_cpus());
- if (!fsverity_read_workqueue)
- panic("failed to allocate fsverity_read_queue");
-}
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (10 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 11/29] fsverity: remove system-wide workqueue Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-08-11 11:48 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
` (16 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
Create a function that will return selected information about the
geometry of the merkle tree. Online fsck for XFS will need this piece
to perform basic checks of the merkle tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/verity/open.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 11 +++++++++++
2 files changed, 48 insertions(+)
diff --git a/fs/verity/open.c b/fs/verity/open.c
index fdeb95eca3af..de1d0bd6e703 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -407,6 +407,43 @@ void __fsverity_cleanup_inode(struct inode *inode)
}
EXPORT_SYMBOL_GPL(__fsverity_cleanup_inode);
+/**
+ * fsverity_merkle_tree_geometry() - return Merkle tree geometry
+ * @inode: the inode to query
+ * @block_size: will be set to the log2 of the size of a merkle tree block
+ * @block_size: will be set to the size of a merkle tree block, in bytes
+ * @tree_size: will be set to the size of the merkle tree, in bytes
+ *
+ * Callers are not required to have opened the file.
+ *
+ * Return: 0 for success, -ENODATA if verity is not enabled, or any of the
+ * error codes that can result from loading verity information while opening a
+ * file.
+ */
+int fsverity_merkle_tree_geometry(struct inode *inode, u8 *log_blocksize,
+ unsigned int *block_size, u64 *tree_size)
+{
+ struct fsverity_info *vi;
+ int error;
+
+ if (!IS_VERITY(inode))
+ return -ENODATA;
+
+ error = ensure_verity_info(inode);
+ if (error)
+ return error;
+
+ vi = inode->i_verity_info;
+ if (log_blocksize)
+ *log_blocksize = vi->tree_params.log_blocksize;
+ if (block_size)
+ *block_size = vi->tree_params.block_size;
+ if (tree_size)
+ *tree_size = vi->tree_params.tree_size;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_merkle_tree_geometry);
+
void __init fsverity_init_info_cache(void)
{
fsverity_info_cachep = KMEM_CACHE_USERCOPY(
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 8155407a7e4c..f10e9493ffa7 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -166,6 +166,9 @@ int __fsverity_file_open(struct inode *inode, struct file *filp);
int __fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
void __fsverity_cleanup_inode(struct inode *inode);
+int fsverity_merkle_tree_geometry(struct inode *inode, u8 *log_blocksize,
+ unsigned int *block_size, u64 *tree_size);
+
/**
* fsverity_cleanup_inode() - free the inode's verity info, if present
* @inode: an inode being evicted
@@ -250,6 +253,14 @@ static inline void fsverity_cleanup_inode(struct inode *inode)
{
}
+static inline int fsverity_merkle_tree_geometry(struct inode *inode,
+ u8 *log_blocksize,
+ unsigned int *block_size,
+ u64 *tree_size)
+{
+ return -EOPNOTSUPP;
+}
+
/* read_metadata.c */
static inline int fsverity_ioctl_read_metadata(struct file *filp,
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (11 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 23:21 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 14/29] xfs: add attribute type for fs-verity Andrey Albershteyn
` (15 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
This patch adds fs-verity verification into iomap's read path. After
BIO's io operation is complete the data are verified against
fs-verity's Merkle tree. Verification work is done in a separate
workqueue.
The read path ioend iomap_read_ioend are stored side by side with
BIOs if FS_VERITY is enabled.
[djwong: fix doc warning]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/iomap/buffered-io.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++--
fs/iomap/ioend.c | 41 +++++++++++++-
include/linux/iomap.h | 13 +++++
3 files changed, 198 insertions(+), 7 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e959a206cba9..87c974e543e0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -6,6 +6,7 @@
#include <linux/module.h>
#include <linux/compiler.h>
#include <linux/fs.h>
+#include <linux/fsverity.h>
#include <linux/iomap.h>
#include <linux/pagemap.h>
#include <linux/uio.h>
@@ -363,6 +364,116 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
pos >= i_size_read(iter->inode);
}
+#ifdef CONFIG_FS_VERITY
+int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
+ int max_active)
+{
+ int ret;
+
+ if (!iomap_fsverity_bioset) {
+ ret = iomap_fsverity_init_bioset();
+ if (ret)
+ return ret;
+ }
+
+ return fsverity_init_wq(sb, wq_flags, max_active);
+}
+EXPORT_SYMBOL_GPL(iomap_init_fsverity);
+
+static void
+iomap_read_fsverify_end_io_work(struct work_struct *work)
+{
+ struct iomap_fsverity_bio *fbio =
+ container_of(work, struct iomap_fsverity_bio, work);
+
+ fsverity_verify_bio(&fbio->bio);
+ iomap_read_end_io(&fbio->bio);
+}
+
+static void
+iomap_read_fsverity_end_io(struct bio *bio)
+{
+ struct iomap_fsverity_bio *fbio =
+ container_of(bio, struct iomap_fsverity_bio, bio);
+
+ INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
+ queue_work(bio->bi_private, &fbio->work);
+}
+
+static struct bio *
+iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
+ int nr_vecs, gfp_t gfp)
+{
+ struct bio *bio;
+
+ bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
+ iomap_fsverity_bioset);
+ if (bio) {
+ bio->bi_private = inode->i_sb->s_verity_wq;
+ bio->bi_end_io = iomap_read_fsverity_end_io;
+ }
+ return bio;
+}
+
+/*
+ * True if tree is not aligned with fs block/folio size and we need zero tail
+ * part of the folio
+ */
+static bool
+iomap_fsverity_tree_end_align(struct iomap_iter *iter, struct folio *folio,
+ loff_t pos, size_t plen)
+{
+ int error;
+ u8 log_blocksize;
+ u64 tree_size, tree_mask, last_block_tree, last_block_pos;
+
+ /* Not a Merkle tree */
+ if (!(iter->iomap.flags & IOMAP_F_BEYOND_EOF))
+ return false;
+
+ if (plen == folio_size(folio))
+ return false;
+
+ if (iter->inode->i_blkbits == folio_shift(folio))
+ return false;
+
+ error = fsverity_merkle_tree_geometry(iter->inode, &log_blocksize, NULL,
+ &tree_size);
+ if (error)
+ return false;
+
+ /*
+ * We are beyond EOF reading Merkle tree. Therefore, it has highest
+ * offset. Mask pos with a tree size to get a position whare are we in
+ * the tree. Then, compare index of a last tree block and the index of
+ * current pos block.
+ */
+ last_block_tree = (tree_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ tree_mask = (1 << fls64(tree_size)) - 1;
+ last_block_pos = ((pos & tree_mask) >> PAGE_SHIFT) + 1;
+
+ return last_block_tree == last_block_pos;
+}
+#else
+# define iomap_fsverity_read_bio_alloc(...) (NULL)
+# define iomap_fsverity_tree_end_align(...) (false)
+#endif /* CONFIG_FS_VERITY */
+
+static struct bio *iomap_read_bio_alloc(struct inode *inode,
+ const struct iomap *iomap, int nr_vecs, gfp_t gfp)
+{
+ struct bio *bio;
+ struct block_device *bdev = iomap->bdev;
+
+ if (fsverity_active(inode) && !(iomap->flags & IOMAP_F_BEYOND_EOF))
+ return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
+
+ bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
+ if (bio)
+ bio->bi_end_io = iomap_read_end_io;
+ return bio;
+}
+
static int iomap_readpage_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
@@ -375,6 +486,10 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
sector_t sector;
int ret;
+ /* Fail reads from broken fsverity files immediately. */
+ if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
+ return -EIO;
+
if (iomap->type == IOMAP_INLINE) {
ret = iomap_read_inline_data(iter, folio);
if (ret)
@@ -391,6 +506,11 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
if (iomap_block_needs_zeroing(iter, pos) &&
!(iomap->flags & IOMAP_F_BEYOND_EOF)) {
folio_zero_range(folio, poff, plen);
+ if (fsverity_active(iter->inode) &&
+ !fsverity_verify_blocks(folio, plen, poff)) {
+ return -EIO;
+ }
+
iomap_set_range_uptodate(folio, poff, plen);
goto done;
}
@@ -408,32 +528,51 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
!bio_add_folio(ctx->bio, folio, plen, poff)) {
gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
gfp_t orig_gfp = gfp;
- unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
if (ctx->bio)
submit_bio(ctx->bio);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
- ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
+
+ ctx->bio = iomap_read_bio_alloc(iter->inode, iomap,
+ bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
+ gfp);
+
/*
* If the bio_alloc fails, try it again for a single page to
* avoid having to deal with partial page reads. This emulates
* what do_mpage_read_folio does.
*/
if (!ctx->bio) {
- ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
- orig_gfp);
+ ctx->bio = iomap_read_bio_alloc(iter->inode,
+ iomap, 1, orig_gfp);
}
if (ctx->rac)
ctx->bio->bi_opf |= REQ_RAHEAD;
ctx->bio->bi_iter.bi_sector = sector;
- ctx->bio->bi_end_io = iomap_read_end_io;
bio_add_folio_nofail(ctx->bio, folio, plen, poff);
}
done:
+ /*
+ * For post EOF region, zero part of the folio which won't be read. This
+ * happens at the end of the region. So far, the only user is
+ * fs-verity which stores continuous data region.
+ *
+ * We also check the fs block size as plen could be smaller than the
+ * block size. If we zero part of the block and mark the whole block
+ * (iomap_set_range_uptodate() works with fsblocks) as uptodate the
+ * iomap_finish_folio_read() will toggle the uptodate bit when the folio
+ * is read.
+ */
+ if (iomap_fsverity_tree_end_align(iter, folio, pos, plen)) {
+ folio_zero_range(folio, poff + plen,
+ folio_size(folio) - (poff + plen));
+ iomap_set_range_uptodate(folio, poff + plen,
+ folio_size(folio) - (poff + plen));
+ }
+
/*
* Move the caller beyond our range so that it keeps making progress.
* For that, we have to include any leading non-uptodate ranges, but
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index 18894ebba6db..400751d82313 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -6,6 +6,8 @@
#include <linux/list_sort.h>
#include "internal.h"
+#define IOMAP_POOL_SIZE (4 * (PAGE_SIZE / SECTOR_SIZE))
+
struct bio_set iomap_ioend_bioset;
EXPORT_SYMBOL_GPL(iomap_ioend_bioset);
@@ -207,9 +209,46 @@ struct iomap_ioend *iomap_split_ioend(struct iomap_ioend *ioend,
}
EXPORT_SYMBOL_GPL(iomap_split_ioend);
+#ifdef CONFIG_FS_VERITY
+struct bio_set *iomap_fsverity_bioset;
+EXPORT_SYMBOL_GPL(iomap_fsverity_bioset);
+int iomap_fsverity_init_bioset(void)
+{
+ struct bio_set *bs, *old;
+ int error;
+
+ bs = kzalloc(sizeof(*bs), GFP_KERNEL);
+ if (!bs)
+ return -ENOMEM;
+
+ error = bioset_init(bs, IOMAP_POOL_SIZE,
+ offsetof(struct iomap_fsverity_bio, bio),
+ BIOSET_NEED_BVECS);
+ if (error) {
+ kfree(bs);
+ return error;
+ }
+
+ /*
+ * This has to be atomic as readaheads can race to create the
+ * bioset. If someone set the pointer before us, we drop ours.
+ */
+ old = cmpxchg(&iomap_fsverity_bioset, NULL, bs);
+ if (old) {
+ bioset_exit(bs);
+ kfree(bs);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(iomap_fsverity_init_bioset);
+#else
+# define iomap_fsverity_init_bioset(...) (-EOPNOTSUPP)
+#endif
+
static int __init iomap_ioend_init(void)
{
- return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
+ return bioset_init(&iomap_ioend_bioset, IOMAP_POOL_SIZE,
offsetof(struct iomap_ioend, io_bio),
BIOSET_NEED_BVECS);
}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73288f28543f..f845876ad8d2 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -339,6 +339,19 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
iter->srcmap.type == IOMAP_MAPPED;
}
+#ifdef CONFIG_FS_VERITY
+extern struct bio_set *iomap_fsverity_bioset;
+
+struct iomap_fsverity_bio {
+ struct work_struct work;
+ struct bio bio;
+};
+
+int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
+ int max_active);
+int iomap_fsverity_init_bioset(void);
+#endif
+
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops, void *private);
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 14/29] xfs: add attribute type for fs-verity
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (12 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-08-11 11:50 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 15/29] xfs: add fs-verity ro-compat flag Andrey Albershteyn
` (14 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
The fsverity descriptor is stored in the extended attributes of the
inode. Add new attribute type for fs-verity metadata. Add
XFS_ATTR_INTERNAL_MASK to skip parent pointer and fs-verity attributes
as those are only for internal use. While we're at it add a few comments
in relevant places that internally visible attributes are not suppose to
be handled via interface defined in xfs_xattr.c.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/libxfs/xfs_da_format.h | 11 ++++++++---
fs/xfs/libxfs/xfs_log_format.h | 1 +
fs/xfs/xfs_trace.h | 3 ++-
3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 86de99e2f757..e5274be2fe9c 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -715,19 +715,23 @@ struct xfs_attr3_leafblock {
#define XFS_ATTR_ROOT_BIT 1 /* limit access to trusted attrs */
#define XFS_ATTR_SECURE_BIT 2 /* limit access to secure attrs */
#define XFS_ATTR_PARENT_BIT 3 /* parent pointer attrs */
+#define XFS_ATTR_VERITY_BIT 4 /* verity descriptor */
#define XFS_ATTR_INCOMPLETE_BIT 7 /* attr in middle of create/delete */
#define XFS_ATTR_LOCAL (1u << XFS_ATTR_LOCAL_BIT)
#define XFS_ATTR_ROOT (1u << XFS_ATTR_ROOT_BIT)
#define XFS_ATTR_SECURE (1u << XFS_ATTR_SECURE_BIT)
#define XFS_ATTR_PARENT (1u << XFS_ATTR_PARENT_BIT)
+#define XFS_ATTR_VERITY (1u << XFS_ATTR_VERITY_BIT)
#define XFS_ATTR_INCOMPLETE (1u << XFS_ATTR_INCOMPLETE_BIT)
#define XFS_ATTR_NSP_ONDISK_MASK (XFS_ATTR_ROOT | \
XFS_ATTR_SECURE | \
- XFS_ATTR_PARENT)
+ XFS_ATTR_PARENT | \
+ XFS_ATTR_VERITY)
/* Private attr namespaces not exposed to userspace */
-#define XFS_ATTR_PRIVATE_NSP_MASK (XFS_ATTR_PARENT)
+#define XFS_ATTR_PRIVATE_NSP_MASK (XFS_ATTR_PARENT | \
+ XFS_ATTR_VERITY)
#define XFS_ATTR_ONDISK_MASK (XFS_ATTR_NSP_ONDISK_MASK | \
XFS_ATTR_LOCAL | \
@@ -737,7 +741,8 @@ struct xfs_attr3_leafblock {
{ XFS_ATTR_LOCAL, "local" }, \
{ XFS_ATTR_ROOT, "root" }, \
{ XFS_ATTR_SECURE, "secure" }, \
- { XFS_ATTR_PARENT, "parent" }
+ { XFS_ATTR_PARENT, "parent" }, \
+ { XFS_ATTR_VERITY, "verity" }
/*
* Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 0d637c276db0..57721f092c80 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -1051,6 +1051,7 @@ struct xfs_icreate_log {
#define XFS_ATTRI_FILTER_MASK (XFS_ATTR_ROOT | \
XFS_ATTR_SECURE | \
XFS_ATTR_PARENT | \
+ XFS_ATTR_VERITY | \
XFS_ATTR_INCOMPLETE)
/*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ba45d801df1c..50034c059e8c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -108,7 +108,8 @@ struct xfs_open_zone;
{ XFS_ATTR_ROOT, "ROOT" }, \
{ XFS_ATTR_SECURE, "SECURE" }, \
{ XFS_ATTR_INCOMPLETE, "INCOMPLETE" }, \
- { XFS_ATTR_PARENT, "PARENT" }
+ { XFS_ATTR_PARENT, "PARENT" }, \
+ { XFS_ATTR_VERITY, "VERITY" }
DECLARE_EVENT_CLASS(xfs_attr_list_class,
TP_PROTO(struct xfs_attr_list_context *ctx),
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 15/29] xfs: add fs-verity ro-compat flag
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (13 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 14/29] xfs: add attribute type for fs-verity Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 16/29] xfs: add inode on-disk VERITY flag Andrey Albershteyn
` (13 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
To mark inodes with fs-verity enabled the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_format.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 2 ++
fs/xfs/xfs_mount.h | 2 ++
3 files changed, 5 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 9566a7623365..3a8c43541dd9 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -374,6 +374,7 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3) /* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY (1 << 4) /* fs-verity */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 711e180f9ebb..e32cd2874bcd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -167,6 +167,8 @@ xfs_sb_version_to_features(
features |= XFS_FEAT_REFLINK;
if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
features |= XFS_FEAT_INOBTCNT;
+ if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+ features |= XFS_FEAT_VERITY;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
features |= XFS_FEAT_FTYPE;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d85084f9f317..90a4ca0d7a65 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -383,6 +383,7 @@ typedef struct xfs_mount {
#define XFS_FEAT_EXCHANGE_RANGE (1ULL << 27) /* exchange range */
#define XFS_FEAT_METADIR (1ULL << 28) /* metadata directory tree */
#define XFS_FEAT_ZONED (1ULL << 29) /* zoned RT device */
+#define XFS_FEAT_VERITY (1ULL << 30) /* fs-verity */
/* Mount features */
#define XFS_FEAT_NOLIFETIME (1ULL << 47) /* disable lifetime hints */
@@ -442,6 +443,7 @@ __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
__XFS_HAS_FEAT(metadir, METADIR)
__XFS_HAS_FEAT(zoned, ZONED)
__XFS_HAS_FEAT(nolifetime, NOLIFETIME)
+__XFS_HAS_FEAT(verity, VERITY)
static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
{
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 16/29] xfs: add inode on-disk VERITY flag
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (14 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 15/29] xfs: add fs-verity ro-compat flag Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 17/29] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
` (12 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_format.h | 7 ++++++-
fs/xfs/libxfs/xfs_inode_buf.c | 8 ++++++++
fs/xfs/libxfs/xfs_inode_util.c | 2 ++
fs/xfs/xfs_iops.c | 2 ++
4 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 3a8c43541dd9..1cd9106d852b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1231,16 +1231,21 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
*/
#define XFS_DIFLAG2_METADATA_BIT 5
+/* inodes sealed with fs-verity */
+#define XFS_DIFLAG2_VERITY_BIT 6
+
#define XFS_DIFLAG2_DAX (1ULL << XFS_DIFLAG2_DAX_BIT)
#define XFS_DIFLAG2_REFLINK (1ULL << XFS_DIFLAG2_REFLINK_BIT)
#define XFS_DIFLAG2_COWEXTSIZE (1ULL << XFS_DIFLAG2_COWEXTSIZE_BIT)
#define XFS_DIFLAG2_BIGTIME (1ULL << XFS_DIFLAG2_BIGTIME_BIT)
#define XFS_DIFLAG2_NREXT64 (1ULL << XFS_DIFLAG2_NREXT64_BIT)
#define XFS_DIFLAG2_METADATA (1ULL << XFS_DIFLAG2_METADATA_BIT)
+#define XFS_DIFLAG2_VERITY (1ULL << XFS_DIFLAG2_VERITY_BIT)
#define XFS_DIFLAG2_ANY \
(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
- XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA)
+ XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA | \
+ XFS_DIFLAG2_VERITY)
static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
{
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index aa13fc00afd7..33a71f85f596 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -756,6 +756,14 @@ xfs_dinode_verify(
!xfs_has_rtreflink(mp))
return __this_address;
+ /* only regular files can have fsverity */
+ if (flags2 & XFS_DIFLAG2_VERITY) {
+ if (!xfs_has_verity(mp))
+ return __this_address;
+ if ((mode & S_IFMT) != S_IFREG)
+ return __this_address;
+ }
+
if (xfs_has_zoned(mp) &&
dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTRMAP)) {
if (be32_to_cpu(dip->di_used_blocks) > mp->m_sb.sb_rgextents)
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 48fe49a5f050..8589ec44feda 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -126,6 +126,8 @@ xfs_ip2xflags(
flags |= FS_XFLAG_DAX;
if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
flags |= FS_XFLAG_COWEXTSIZE;
+ if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+ flags |= FS_XFLAG_VERITY;
}
if (xfs_inode_has_attr_fork(ip))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 8cddbb7c149b..83d31802f943 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1393,6 +1393,8 @@ xfs_diflags_to_iflags(
flags |= S_NOATIME;
if (init && xfs_inode_should_enable_dax(ip))
flags |= S_DAX;
+ if (xflags & FS_XFLAG_VERITY)
+ flags |= S_VERITY;
/*
* S_DAX can only be set during inode initialization and is never set by
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 17/29] xfs: initialize fs-verity on file open and cleanup on inode destruction
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (15 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 16/29] xfs: add inode on-disk VERITY flag Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 18/29] xfs: don't allow to enable DAX on fs-verity sealed inode Andrey Albershteyn
` (11 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
fs-verity will read and attach metadata (not the tree itself) from
a disk for those inodes which already have fs-verity enabled.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 8 ++++++++
fs/xfs/xfs_super.c | 2 ++
2 files changed, 10 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0b41b18debf3..da13162925d9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -34,6 +34,7 @@
#include <linux/mman.h>
#include <linux/fadvise.h>
#include <linux/mount.h>
+#include <linux/fsverity.h>
static const struct vm_operations_struct xfs_file_vm_ops;
@@ -1555,11 +1556,18 @@ xfs_file_open(
struct inode *inode,
struct file *file)
{
+ int error;
+
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
if (xfs_get_atomic_write_min(XFS_I(inode)) > 0)
file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
+
+ error = fsverity_file_open(inode, file);
+ if (error)
+ return error;
+
return generic_file_open(inode, file);
}
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index bb0a82635a77..7b1ace75955c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -53,6 +53,7 @@
#include <linux/magic.h>
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
+#include <linux/fsverity.h>
static const struct super_operations xfs_super_operations;
@@ -699,6 +700,7 @@ xfs_fs_destroy_inode(
ASSERT(!rwsem_is_locked(&inode->i_rwsem));
XFS_STATS_INC(ip->i_mount, vn_rele);
XFS_STATS_INC(ip->i_mount, vn_remove);
+ fsverity_cleanup_inode(inode);
xfs_inode_mark_reclaimable(ip);
}
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 18/29] xfs: don't allow to enable DAX on fs-verity sealed inode
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (16 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 17/29] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 19/29] xfs: disable direct read path for fs-verity files Andrey Albershteyn
` (10 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
fs-verity doesn't support DAX. Forbid filesystem to enable DAX on
inodes which already have fs-verity enabled. The opposite is checked
when fs-verity is enabled, it won't be enabled if DAX is.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix typo in subject]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_iops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 83d31802f943..2d8ecdd0b403 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1365,6 +1365,8 @@ xfs_inode_should_enable_dax(
return false;
if (!xfs_inode_supports_dax(ip))
return false;
+ if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+ return false;
if (xfs_has_dax_always(ip->i_mount))
return true;
if (ip->i_diflags2 & XFS_DIFLAG2_DAX)
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 19/29] xfs: disable direct read path for fs-verity files
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (17 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 18/29] xfs: don't allow to enable DAX on fs-verity sealed inode Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
` (9 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
The direct path is not supported on verity files. Attempts to use direct
I/O path on such files should fall back to buffered I/O path.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix braces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index da13162925d9..9680c97ee40f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -259,7 +259,8 @@ xfs_file_dax_read(
struct kiocb *iocb,
struct iov_iter *to)
{
- struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct xfs_inode *ip = XFS_I(inode);
ssize_t ret = 0;
trace_xfs_file_dax_read(iocb, to);
@@ -312,10 +313,18 @@ xfs_file_read_iter(
if (IS_DAX(inode))
ret = xfs_file_dax_read(iocb, to);
- else if (iocb->ki_flags & IOCB_DIRECT)
+ else if ((iocb->ki_flags & IOCB_DIRECT) && !fsverity_active(inode))
ret = xfs_file_dio_read(iocb, to);
- else
+ else {
+ /*
+ * In case fs-verity is enabled, we also fallback to the
+ * buffered read from the direct read path. Therefore,
+ * IOCB_DIRECT is set and need to be cleared (see
+ * generic_file_read_iter())
+ */
+ iocb->ki_flags &= ~IOCB_DIRECT;
ret = xfs_file_buffered_read(iocb, to);
+ }
if (ret > 0)
XFS_STATS_ADD(mp, xs_read_bytes, ret);
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (18 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 19/29] xfs: disable direct read path for fs-verity files Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 22:27 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages Andrey Albershteyn
` (8 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
While writing Merkle tree, file is read-only and there's no further
writes except Merkle tree building. The file is truncated beforehand to
remove any preallocated extents.
The Merkle tree is the only data XFS will write. As we don't want XFS to
truncate file after we done writing, let's also skip truncation on
fsverity files. Therefore, we also need to disable preallocations while
writing merkle tree as we don't want any unused extents past the tree.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_iomap.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ff05e6b1b0bb..00ec1a738b39 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -32,6 +32,8 @@
#include "xfs_rtbitmap.h"
#include "xfs_icache.h"
#include "xfs_zone_alloc.h"
+#include "xfs_fsverity.h"
+#include <linux/fsverity.h>
#define XFS_ALLOC_ALIGN(mp, off) \
(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -1849,7 +1851,9 @@ xfs_buffered_write_iomap_begin(
* Determine the initial size of the preallocation.
* We clean up any extra preallocation when the file is closed.
*/
- if (xfs_has_allocsize(mp))
+ if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
+ prealloc_blocks = 0;
+ else if (xfs_has_allocsize(mp))
prealloc_blocks = mp->m_allocsize_blocks;
else if (allocfork == XFS_DATA_FORK)
prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
@@ -1976,6 +1980,13 @@ xfs_buffered_write_iomap_end(
if (flags & IOMAP_FAULT)
return 0;
+ /*
+ * While writing Merkle tree to disk we would not have any other
+ * delayed allocations
+ */
+ if (xfs_iflags_test(XFS_I(inode), XFS_VERITY_CONSTRUCTION))
+ return 0;
+
/* Nothing to do if we've written the entire delalloc extent */
start_byte = iomap_last_written_block(inode, offset, written);
end_byte = round_up(offset + length, i_blocksize(inode));
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (19 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 22:33 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 22/29] xfs: add fs-verity support Andrey Albershteyn
` (7 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
In the writeback path use unbound write interface, meaning that inode
size is not updated and none of the file size checks are applied.
In read path let iomap know that data is stored beyond EOF via flag.
This leads to skipping of post EOF zeroing.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_aops.c | 21 ++++++++++++++-------
fs/xfs/xfs_iomap.c | 9 +++++++--
2 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 63151feb9c3f..02e2c04b36c1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -22,6 +22,7 @@
#include "xfs_icache.h"
#include "xfs_zone_alloc.h"
#include "xfs_rtgroup.h"
+#include "xfs_fsverity.h"
struct xfs_writepage_ctx {
struct iomap_writepage_ctx ctx;
@@ -628,10 +629,12 @@ static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
STATIC int
xfs_vm_writepages(
- struct address_space *mapping,
- struct writeback_control *wbc)
+ struct address_space *mapping,
+ struct writeback_control *wbc)
{
- struct xfs_inode *ip = XFS_I(mapping->host);
+ struct xfs_inode *ip = XFS_I(mapping->host);
+ struct xfs_writepage_ctx wpc = { };
+
xfs_iflags_clear(ip, XFS_ITRUNCATED);
@@ -644,12 +647,16 @@ xfs_vm_writepages(
if (xc.open_zone)
xfs_open_zone_put(xc.open_zone);
return error;
- } else {
- struct xfs_writepage_ctx wpc = { };
+ }
- return iomap_writepages(mapping, wbc, &wpc.ctx,
- &xfs_writeback_ops);
+ if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION)) {
+ wbc->range_start = XFS_FSVERITY_MTREE_OFFSET;
+ wbc->range_end = LLONG_MAX;
+ return iomap_writepages_unbound(mapping, wbc, &wpc.ctx,
+ &xfs_writeback_ops);
}
+
+ return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
}
STATIC int
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 00ec1a738b39..c8725508165c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -2031,6 +2031,7 @@ xfs_read_iomap_begin(
bool shared = false;
unsigned int lockmode = XFS_ILOCK_SHARED;
u64 seq;
+ int iomap_flags;
ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
@@ -2050,8 +2051,12 @@ xfs_read_iomap_begin(
if (error)
return error;
trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, flags,
- shared ? IOMAP_F_SHARED : 0, seq);
+ iomap_flags = shared ? IOMAP_F_SHARED : 0;
+
+ if (fsverity_active(inode) && offset >= XFS_FSVERITY_MTREE_OFFSET)
+ iomap_flags |= IOMAP_F_BEYOND_EOF;
+
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
}
const struct iomap_ops xfs_read_iomap_ops = {
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 22/29] xfs: add fs-verity support
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (20 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 23:05 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 23/29] xfs: add fs-verity ioctls Andrey Albershteyn
` (6 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
Add integration with fs-verity. XFS stores fs-verity descriptor in
the extended file attributes and the Merkle tree is store in data fork
beyond EOF.
The descriptor is stored under "vdesc" extended attribute.
The Merkle tree reading/writing is done through iomap interface. The
data itself are at offset 1 << 53 in the inode's page cache. When XFS
reads from this region iomap doesn't call into fsverity to verify it
against Merkle tree. For data, verification is done on BIO completion.
When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
flag is set meaning that the Merkle tree is being build. The
initialization ends with storing of verity descriptor and setting
inode on-disk flag (XFS_DIFLAG2_VERITY).
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/Makefile | 1 +
fs/xfs/libxfs/xfs_da_format.h | 4 +
fs/xfs/xfs_bmap_util.c | 7 +
fs/xfs/xfs_fsverity.c | 311 ++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_fsverity.h | 28 ++++
fs/xfs/xfs_inode.h | 6 +
fs/xfs/xfs_super.c | 20 +++
7 files changed, 377 insertions(+)
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5bf501cf8271..ad66439db7bf 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -147,6 +147,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
+xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
# notify failure
ifeq ($(CONFIG_MEMORY_FAILURE),y)
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index e5274be2fe9c..b17fdbbb48aa 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -909,4 +909,8 @@ struct xfs_parent_rec {
__be32 p_gen;
} __packed;
+/* fs-verity ondisk xattr name used for the fsverity descriptor */
+#define XFS_VERITY_DESCRIPTOR_NAME "vdesc"
+#define XFS_VERITY_DESCRIPTOR_NAME_LEN (sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
+
#endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 06ca11731e43..af1933129647 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -31,6 +31,7 @@
#include "xfs_rtbitmap.h"
#include "xfs_rtgroup.h"
#include "xfs_zone_alloc.h"
+#include <linux/fsverity.h>
/* Kernel only BMAP related definitions and functions */
@@ -553,6 +554,12 @@ xfs_can_free_eofblocks(
if (last_fsb <= end_fsb)
return false;
+ /*
+ * Nothing to clean on fsverity inodes as they are read-only
+ */
+ if (IS_VERITY(VFS_I(ip)))
+ return false;
+
/*
* Check if there is an post-EOF extent to free. If there are any
* delalloc blocks attached to the inode (data fork delalloc
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
new file mode 100644
index 000000000000..ec7dea4289d5
--- /dev/null
+++ b/fs/xfs/xfs_fsverity.c
@@ -0,0 +1,311 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_log_format.h"
+#include "xfs_attr.h"
+#include "xfs_bmap_util.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_trace.h"
+#include "xfs_quota.h"
+#include "xfs_ag.h"
+#include "xfs_fsverity.h"
+#include "xfs_iomap.h"
+#include <linux/fsverity.h>
+#include <linux/pagemap.h>
+
+/*
+ * Initialize an args structure to load or store the fsverity descriptor.
+ * Caller must ensure @args is zeroed except for value and valuelen.
+ */
+static inline void
+xfs_fsverity_init_vdesc_args(
+ struct xfs_inode *ip,
+ struct xfs_da_args *args)
+{
+ args->geo = ip->i_mount->m_attr_geo;
+ args->whichfork = XFS_ATTR_FORK;
+ args->attr_filter = XFS_ATTR_VERITY;
+ args->op_flags = XFS_DA_OP_OKNOENT;
+ args->dp = ip;
+ args->owner = ip->i_ino;
+ args->name = XFS_VERITY_DESCRIPTOR_NAME;
+ args->namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN;
+ xfs_attr_sethash(args);
+}
+
+/* Delete the verity descriptor. */
+static int
+xfs_fsverity_delete_descriptor(
+ struct xfs_inode *ip)
+{
+ struct xfs_da_args args = { };
+
+ xfs_fsverity_init_vdesc_args(ip, &args);
+ return xfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
+}
+
+/* Retrieve the verity descriptor. */
+static int
+xfs_fsverity_get_descriptor(
+ struct inode *inode,
+ void *buf,
+ size_t buf_size)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_da_args args = {
+ .value = buf,
+ .valuelen = buf_size,
+ };
+ int error = 0;
+
+ /*
+ * The fact that (returned attribute size) == (provided buf_size) is
+ * checked by xfs_attr_copy_value() (returns -ERANGE). No descriptor
+ * is treated as a short read so that common fsverity code will
+ * complain.
+ */
+ xfs_fsverity_init_vdesc_args(ip, &args);
+ error = xfs_attr_get(&args);
+ if (error == -ENOATTR)
+ return 0;
+ if (error)
+ return error;
+
+ return args.valuelen;
+}
+
+/* Try to remove all the fsverity metadata after a failed enablement. */
+static int
+xfs_fsverity_delete_metadata(
+ struct xfs_inode *ip)
+{
+ struct xfs_trans *tp;
+ struct xfs_mount *mp = ip->i_mount;
+ int error;
+
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+ if (error) {
+ ASSERT(xfs_is_shutdown(mp));
+ return error;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
+
+ /*
+ * As only merkle tree is getting removed, no need to change inode size
+ */
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
+ if (error)
+ goto err_cancel;
+
+ error = xfs_trans_commit(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ if (error)
+ return error;
+
+ error = xfs_fsverity_delete_descriptor(ip);
+ return error != -ENOATTR ? error : 0;
+
+err_cancel:
+ xfs_trans_cancel(tp);
+ return error;
+}
+
+
+/* Prepare to enable fsverity by clearing old metadata. */
+static int
+xfs_fsverity_begin_enable(
+ struct file *filp)
+{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+ int error;
+
+ xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
+
+ if (IS_DAX(inode))
+ return -EINVAL;
+
+ if (inode->i_size > XFS_FSVERITY_MTREE_OFFSET)
+ return -EFBIG;
+
+ if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
+ return -EBUSY;
+
+ error = xfs_qm_dqattach(ip);
+ if (error)
+ return error;
+
+ return xfs_fsverity_delete_metadata(ip);
+}
+
+/* Complete (or fail) the process of enabling fsverity. */
+static int
+xfs_fsverity_end_enable(
+ struct file *filp,
+ const void *desc,
+ size_t desc_size,
+ u64 merkle_tree_size)
+{
+ struct xfs_da_args args = {
+ .value = (void *)desc,
+ .valuelen = desc_size,
+ };
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ int error = 0;
+
+ xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
+
+ /* fs-verity failed, just cleanup */
+ if (desc == NULL)
+ goto out;
+
+ xfs_fsverity_init_vdesc_args(ip, &args);
+ error = xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
+ if (error)
+ goto out;
+
+ /*
+ * Wait for Merkel tree get written to disk before setting on-disk inode
+ * flag and clering XFS_VERITY_CONSTRUCTION
+ */
+ error = filemap_write_and_wait(inode->i_mapping);
+ if (error)
+ return error;
+
+ /* Set fsverity inode flag */
+ error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
+ 0, 0, false, &tp);
+ if (error)
+ goto out;
+
+ /*
+ * Ensure that we've persisted the verity information before we enable
+ * it on the inode and tell the caller we have sealed the inode.
+ */
+ ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ xfs_trans_set_sync(tp);
+
+ error = xfs_trans_commit(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ if (!error)
+ inode->i_flags |= S_VERITY;
+
+out:
+ if (error) {
+ int error2;
+
+ error2 = xfs_fsverity_delete_metadata(ip);
+ if (error2)
+ xfs_alert(ip->i_mount,
+"ino 0x%llx failed to clean up new fsverity metadata, err %d",
+ ip->i_ino, error2);
+ }
+
+ xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
+ return error;
+}
+
+static void
+xfs_fsverity_adjust_read(
+ struct ioregion *region)
+{
+ u8 log_blocksize;
+ unsigned int block_size;
+ u64 tree_size;
+ u64 position = region->pos & XFS_FSVERITY_MTREE_MASK;
+
+ fsverity_merkle_tree_geometry(region->inode, &log_blocksize,
+ &block_size, &tree_size);
+
+ if (position + region->length < tree_size)
+ return;
+
+ region->length = tree_size - position;
+}
+
+/* Retrieve a merkle tree block. */
+static struct page *
+xfs_fsverity_read_merkle(
+ struct inode *inode,
+ pgoff_t index,
+ unsigned long num_ra_pages)
+{
+ u64 position = (index << PAGE_SHIFT) | XFS_FSVERITY_MTREE_OFFSET;
+ struct ioregion region = {
+ .inode = inode,
+ .pos = position,
+ .length = PAGE_SIZE,
+ .ops = &xfs_read_iomap_ops,
+ };
+ int error;
+ struct folio *folio;
+
+ /*
+ * As region->length is PAGE_SIZE we have to adjust the length for the
+ * end of the tree. The case when tree blocks size are smaller then
+ * PAGE_SIZE.
+ */
+ xfs_fsverity_adjust_read(®ion);
+
+ folio = iomap_read_region(®ion);
+ if (IS_ERR(folio))
+ return ERR_PTR(-EIO);
+
+ /* Wait for buffered read to finish */
+ error = folio_wait_locked_killable(folio);
+ if (error)
+ return ERR_PTR(error);
+ if (IS_ERR(folio) || !folio_test_uptodate(folio))
+ return ERR_PTR(-EFSCORRUPTED);
+
+ return folio_file_page(folio, 0);
+}
+
+/* Write a merkle tree block. */
+static int
+xfs_fsverity_write_merkle(
+ struct inode *inode,
+ const void *buf,
+ u64 pos,
+ unsigned int size)
+{
+ struct ioregion region = {
+ .inode = inode,
+ .pos = pos | XFS_FSVERITY_MTREE_OFFSET,
+ .buf = buf,
+ .length = size,
+ .ops = &xfs_buffered_write_iomap_ops,
+ };
+
+ if (region.pos + region.length > inode->i_sb->s_maxbytes)
+ return -EFBIG;
+
+ return iomap_write_region(®ion);
+}
+
+const struct fsverity_operations xfs_fsverity_ops = {
+ .begin_enable_verity = xfs_fsverity_begin_enable,
+ .end_enable_verity = xfs_fsverity_end_enable,
+ .get_verity_descriptor = xfs_fsverity_get_descriptor,
+ .read_merkle_tree_page = xfs_fsverity_read_merkle,
+ .write_merkle_tree_block = xfs_fsverity_write_merkle,
+};
diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
new file mode 100644
index 000000000000..e063b7288dc0
--- /dev/null
+++ b/fs/xfs/xfs_fsverity.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_FSVERITY_H__
+#define __XFS_FSVERITY_H__
+
+/* Merkle tree location in page cache. We take memory region from the inode's
+ * address space for Merkle tree.
+ *
+ * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
+ * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
+ * bits address space.
+ *
+ * At this Merkle tree size we can cover 295EB large file. This is much larger
+ * than the currently supported file size.
+ *
+ * As we allocate some pagecache space for fsverity tree we need to limit
+ * maximum fsverity file size.
+ */
+#define XFS_FSVERITY_MTREE_OFFSET (1ULL << 53)
+#define XFS_FSVERITY_MTREE_MASK (XFS_FSVERITY_MTREE_OFFSET - 1)
+
+#ifdef CONFIG_FS_VERITY
+extern const struct fsverity_operations xfs_fsverity_ops;
+#endif /* CONFIG_FS_VERITY */
+
+#endif /* __XFS_FSVERITY_H__ */
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index d7e2b902ef5c..033e1a71e64b 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -404,6 +404,12 @@ static inline bool xfs_inode_can_hw_atomic_write(const struct xfs_inode *ip)
*/
#define XFS_IREMAPPING (1U << 15)
+/*
+ * fs-verity's Merkle tree is under construction. The file is read-only, the
+ * only writes happening is the ones with Merkle tree blocks.
+ */
+#define XFS_VERITY_CONSTRUCTION (1U << 16)
+
/* All inode state flags related to inode reclaim. */
#define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
XFS_IRECLAIM | \
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7b1ace75955c..10fb23ac672a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -30,6 +30,7 @@
#include "xfs_filestream.h"
#include "xfs_quota.h"
#include "xfs_sysfs.h"
+#include "xfs_fsverity.h"
#include "xfs_ondisk.h"
#include "xfs_rmap_item.h"
#include "xfs_refcount_item.h"
@@ -54,6 +55,7 @@
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
#include <linux/fsverity.h>
+#include <linux/iomap.h>
static const struct super_operations xfs_super_operations;
@@ -1711,6 +1713,9 @@ xfs_fs_fill_super(
sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
#endif
sb->s_op = &xfs_super_operations;
+#ifdef CONFIG_FS_VERITY
+ sb->s_vop = &xfs_fsverity_ops;
+#endif
/*
* Delay mount work if the debug hook is set. This is debug
@@ -1964,10 +1969,25 @@ xfs_fs_fill_super(
xfs_set_resuming_quotaon(mp);
mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
+ if (xfs_has_verity(mp))
+ xfs_warn(mp,
+ "EXPERIMENTAL fsverity feature in use. Use at your own risk!");
+
error = xfs_mountfs(mp);
if (error)
goto out_filestream_unmount;
+#ifdef CONFIG_FS_VERITY
+ /*
+ * Don't use a high priority workqueue like the other fsverity
+ * implementations because that will lead to conflicts with the xfs log
+ * workqueue.
+ */
+ error = iomap_init_fsverity(mp->m_super, 0, 0);
+ if (error)
+ goto out_unmount;
+#endif
+
root = igrab(VFS_I(mp->m_rootip));
if (!root) {
error = -ENOENT;
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 23/29] xfs: add fs-verity ioctls
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (21 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 22/29] xfs: add fs-verity support Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 24/29] xfs: advertise fs-verity being available on filesystem Andrey Albershteyn
` (5 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn, Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
Add fs-verity ioctls to enable, dump metadata (descriptor and Merkle
tree pages) and obtain file's digest.
[djwong: remove unnecessary casting]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_ioctl.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d250f7f74e3b..589c36ee4e7f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -44,6 +44,7 @@
#include <linux/mount.h>
#include <linux/fileattr.h>
+#include <linux/fsverity.h>
/* Return 0 on success or positive error */
int
@@ -1428,6 +1429,21 @@ xfs_file_ioctl(
case XFS_IOC_COMMIT_RANGE:
return xfs_ioc_commit_range(filp, arg);
+ case FS_IOC_ENABLE_VERITY:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_enable(filp, arg);
+
+ case FS_IOC_MEASURE_VERITY:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_measure(filp, arg);
+
+ case FS_IOC_READ_VERITY_METADATA:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_read_metadata(filp, arg);
+
default:
return -ENOTTY;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 24/29] xfs: advertise fs-verity being available on filesystem
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (22 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 23/29] xfs: add fs-verity ioctls Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 25/29] xfs: check and repair the verity inode flag state Andrey Albershteyn
` (4 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: "Darrick J. Wong" <djwong@kernel.org>
Advertise that this filesystem supports fsverity.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 12463ba766da..9d07c7872e94 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -250,6 +250,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_PARENT (1 << 25) /* linux parent pointers */
#define XFS_FSOP_GEOM_FLAGS_METADIR (1 << 26) /* metadata directories */
#define XFS_FSOP_GEOM_FLAGS_ZONED (1 << 27) /* zoned rt device */
+#define XFS_FSOP_GEOM_FLAGS_VERITY (1 << 28) /* fs-verity */
/*
* Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index e32cd2874bcd..7507518a1c72 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1575,6 +1575,8 @@ xfs_fs_geometry(
geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
if (xfs_has_zoned(mp))
geo->flags |= XFS_FSOP_GEOM_FLAGS_ZONED;
+ if (xfs_has_verity(mp))
+ geo->flags |= XFS_FSOP_GEOM_FLAGS_VERITY;
geo->rtsectsize = sbp->sb_blocksize;
geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 25/29] xfs: check and repair the verity inode flag state
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (23 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 24/29] xfs: advertise fs-verity being available on filesystem Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck Andrey Albershteyn
` (3 subsequent siblings)
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
From: "Darrick J. Wong" <djwong@kernel.org>
If an inode has the incore verity iflag set, make sure that we can
actually activate fsverity on that inode. If activation fails due to
a fsverity metadata validation error, clear the flag. The usage model
for fsverity requires that any program that cares about verity state is
required to call statx/getflags to check that the flag is set after
opening the file, so clearing the flag will not compromise that model.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/scrub/attr.c | 7 +++++
fs/xfs/scrub/common.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/common.h | 3 ++
fs/xfs/scrub/inode.c | 7 +++++
fs/xfs/scrub/inode_repair.c | 36 ++++++++++++++++++++++
5 files changed, 127 insertions(+)
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 708334f9b2bd..b1448832ae6b 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -646,6 +646,13 @@ xchk_xattr(
if (!xfs_inode_hasattr(sc->ip))
return -ENOENT;
+ /*
+ * If this is a verity file that won't activate, we cannot check the
+ * merkle tree geometry.
+ */
+ if (xchk_inode_verity_broken(sc->ip))
+ xchk_set_incomplete(sc);
+
/* Allocate memory for xattr checking. */
error = xchk_setup_xattr_buf(sc, 0);
if (error == -ENOMEM)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 28ad341df8ee..df031f831bba 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -45,6 +45,8 @@
#include "scrub/health.h"
#include "scrub/tempfile.h"
+#include <linux/fsverity.h>
+
/* Common code for the metadata scrubbers. */
/*
@@ -1735,3 +1737,75 @@ xchk_inode_count_blocks(
return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, nextents,
count);
}
+
+/*
+ * If this inode has S_VERITY set on it, read the merkle tree geometry, which
+ * will activate the incore fsverity context for this file. If the activation
+ * fails with anything other than ENOMEM, the file is corrupt, which we can
+ * detect later with fsverity_active.
+ *
+ * Callers must hold the IOLOCK and must not hold the ILOCK of sc->ip because
+ * activation reads xattrs. @blocksize and @treesize will be filled out with
+ * merkle tree geometry if they are not NULL pointers.
+ */
+int
+xchk_inode_setup_verity(
+ struct xfs_scrub *sc,
+ u8 *log_blocksize,
+ unsigned int *blocksize,
+ u64 *treesize)
+{
+ u8 lbs;
+ unsigned int bs;
+ u64 ts;
+ int error;
+
+ if (!IS_VERITY(VFS_I(sc->ip)))
+ return 0;
+
+ error = fsverity_merkle_tree_geometry(VFS_I(sc->ip), &lbs, &bs, &ts);
+ switch (error) {
+ case 0:
+ /* fsverity is active; return tree geometry. */
+ if (log_blocksize)
+ *log_blocksize = lbs;
+ if (blocksize)
+ *blocksize = bs;
+ if (treesize)
+ *treesize = ts;
+ break;
+ case -ENODATA:
+ case -EMSGSIZE:
+ case -EINVAL:
+ case -EFSCORRUPTED:
+ case -EFBIG:
+ /*
+ * The nonzero errno codes above are the error codes that can
+ * be returned from fsverity on metadata validation errors.
+ * Set the geometry to zero.
+ */
+ if (log_blocksize)
+ *log_blocksize = 0;
+ if (blocksize)
+ *blocksize = 0;
+ if (treesize)
+ *treesize = 0;
+ return 0;
+ default:
+ /* runtime errors */
+ return error;
+ }
+
+ return 0;
+}
+
+/*
+ * Is this a verity file that failed to activate? Callers must have tried to
+ * activate fsverity via xchk_inode_setup_verity.
+ */
+bool
+xchk_inode_verity_broken(
+ struct xfs_inode *ip)
+{
+ return IS_VERITY(VFS_I(ip)) && !fsverity_active(VFS_I(ip));
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 19877d99f255..c6ec7b0cc68a 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -289,6 +289,9 @@ int xchk_inode_is_allocated(struct xfs_scrub *sc, xfs_agino_t agino,
bool *inuse);
int xchk_inode_count_blocks(struct xfs_scrub *sc, int whichfork,
xfs_extnum_t *nextents, xfs_filblks_t *count);
+int xchk_inode_setup_verity(struct xfs_scrub *sc, u8 *log_blocksize,
+ unsigned int *blocksize, u64 *treesize);
+bool xchk_inode_verity_broken(struct xfs_inode *ip);
bool xchk_inode_is_dirtree_root(const struct xfs_inode *ip);
bool xchk_inode_is_sb_rooted(const struct xfs_inode *ip);
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index bb3f475b6353..f173d1fea6bd 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -36,6 +36,10 @@ xchk_prepare_iscrub(
xchk_ilock(sc, XFS_IOLOCK_EXCL);
+ error = xchk_inode_setup_verity(sc, NULL, NULL, NULL);
+ if (error)
+ return error;
+
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
@@ -833,6 +837,9 @@ xchk_inode(
if (S_ISREG(VFS_I(sc->ip)->i_mode))
xchk_inode_check_reflink_iflag(sc, sc->ip->i_ino);
+ if (xchk_inode_verity_broken(sc->ip))
+ xchk_ino_set_corrupt(sc, sc->sm->sm_ino);
+
xchk_inode_check_unlinked(sc);
xchk_inode_xref(sc, sc->ip->i_ino, &di);
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index a90a011c7e5f..e5dd39759f7a 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -573,6 +573,8 @@ xrep_dinode_flags(
dip->di_nrext64_pad = 0;
else if (dip->di_version >= 3)
dip->di_v3_pad = 0;
+ if (!xfs_has_verity(mp) || !S_ISREG(mode))
+ flags2 &= ~XFS_DIFLAG2_VERITY;
if (flags2 & XFS_DIFLAG2_METADATA) {
xfs_failaddr_t fa;
@@ -1613,6 +1615,10 @@ xrep_dinode_core(
if (iget_error)
return iget_error;
+ error = xchk_inode_setup_verity(sc, NULL, NULL, NULL);
+ if (error)
+ return error;
+
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
@@ -2032,6 +2038,27 @@ xrep_inode_unlinked(
return 0;
}
+/*
+ * If this file is a fsverity file, xchk_prepare_iscrub or xrep_dinode_core
+ * should have activated it. If it's still not active, then there's something
+ * wrong with the verity descriptor and we should turn it off.
+ */
+STATIC int
+xrep_inode_verity(
+ struct xfs_scrub *sc)
+{
+ struct inode *inode = VFS_I(sc->ip);
+
+ if (xchk_inode_verity_broken(sc->ip)) {
+ sc->ip->i_diflags2 &= ~XFS_DIFLAG2_VERITY;
+ inode->i_flags &= ~S_VERITY;
+
+ xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+ }
+
+ return 0;
+}
+
/* Repair an inode's fields. */
int
xrep_inode(
@@ -2081,6 +2108,15 @@ xrep_inode(
return error;
}
+ /*
+ * Disable fsverity if it cannot be activated. Activation failure
+ * prohibits the file from being opened, so there cannot be another
+ * program with an open fd to what it thinks is a verity file.
+ */
+ error = xrep_inode_verity(sc);
+ if (error)
+ return error;
+
/* Reconnect incore unlinked list */
error = xrep_inode_unlinked(sc);
if (error)
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (24 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 25/29] xfs: check and repair the verity inode flag state Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 15:28 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 27/29] xfs: report verity failures through the health system Andrey Albershteyn
` (2 subsequent siblings)
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
The quotacheck doesn't initialize sc->ip.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/scrub/trace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index d7c4ced47c15..4368f08e91c6 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -479,7 +479,7 @@ DECLARE_EVENT_CLASS(xchk_dqiter_class,
__field(xfs_exntst_t, state)
),
TP_fast_assign(
- __entry->dev = cursor->sc->ip->i_mount->m_super->s_dev;
+ __entry->dev = cursor->sc->mp->m_super->s_dev;
__entry->dqtype = cursor->dqtype;
__entry->ino = cursor->quota_ip->i_ino;
__entry->cur_id = cursor->id;
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 27/29] xfs: report verity failures through the health system
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (25 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 28/29] xfs: add fsverity traces Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 29/29] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: "Darrick J. Wong" <djwong@kernel.org>
Record verity failures and report them through the health system.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/libxfs/xfs_health.h | 4 +++-
fs/xfs/xfs_fsverity.c | 11 +++++++++++
fs/xfs/xfs_health.c | 1 +
4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 9d07c7872e94..501e4e59c6c9 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -422,6 +422,7 @@ struct xfs_bulkstat {
#define XFS_BS_SICK_SYMLINK (1 << 6) /* symbolic link remote target */
#define XFS_BS_SICK_PARENT (1 << 7) /* parent pointers */
#define XFS_BS_SICK_DIRTREE (1 << 8) /* directory tree structure */
+#define XFS_BS_SICK_DATA (1 << 9) /* file data */
/*
* Project quota id helpers (previously projid was 16bit only
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index b31000f7190c..fa91916ad072 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -104,6 +104,7 @@ struct xfs_rtgroup;
/* Don't propagate sick status to ag health summary during inactivation */
#define XFS_SICK_INO_FORGET (1 << 12)
#define XFS_SICK_INO_DIRTREE (1 << 13) /* directory tree structure */
+#define XFS_SICK_INO_DATA (1 << 14) /* file data */
/* Primary evidence of health problems in a given group. */
#define XFS_SICK_FS_PRIMARY (XFS_SICK_FS_COUNTERS | \
@@ -140,7 +141,8 @@ struct xfs_rtgroup;
XFS_SICK_INO_XATTR | \
XFS_SICK_INO_SYMLINK | \
XFS_SICK_INO_PARENT | \
- XFS_SICK_INO_DIRTREE)
+ XFS_SICK_INO_DIRTREE | \
+ XFS_SICK_INO_DATA)
#define XFS_SICK_INO_ZAPPED (XFS_SICK_INO_BMBTD_ZAPPED | \
XFS_SICK_INO_BMBTA_ZAPPED | \
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index ec7dea4289d5..dfe7b0bcd97e 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -21,6 +21,7 @@
#include "xfs_ag.h"
#include "xfs_fsverity.h"
#include "xfs_iomap.h"
+#include "xfs_health.h"
#include <linux/fsverity.h>
#include <linux/pagemap.h>
@@ -302,10 +303,20 @@ xfs_fsverity_write_merkle(
return iomap_write_region(®ion);
}
+static void
+xfs_fsverity_file_corrupt(
+ struct inode *inode,
+ loff_t pos,
+ size_t len)
+{
+ xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
+}
+
const struct fsverity_operations xfs_fsverity_ops = {
.begin_enable_verity = xfs_fsverity_begin_enable,
.end_enable_verity = xfs_fsverity_end_enable,
.get_verity_descriptor = xfs_fsverity_get_descriptor,
.read_merkle_tree_page = xfs_fsverity_read_merkle,
.write_merkle_tree_block = xfs_fsverity_write_merkle,
+ .file_corrupt = xfs_fsverity_file_corrupt,
};
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index 7c541fb373d5..4202b8441735 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -487,6 +487,7 @@ static const struct ioctl_sick_map ino_map[] = {
{ XFS_SICK_INO_DIR_ZAPPED, XFS_BS_SICK_DIR },
{ XFS_SICK_INO_SYMLINK_ZAPPED, XFS_BS_SICK_SYMLINK },
{ XFS_SICK_INO_DIRTREE, XFS_BS_SICK_DIRTREE },
+ { XFS_SICK_INO_DATA, XFS_BS_SICK_DATA },
};
/* Fill out bulkstat health info. */
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 28/29] xfs: add fsverity traces
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (26 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 27/29] xfs: report verity failures through the health system Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
2025-07-29 23:06 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 29/29] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
28 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
Even though fsverity has traces, debugging issues with varying block
sizes could a bit less transparent without read/write traces.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/xfs/xfs_fsverity.c | 8 ++++++++
fs/xfs/xfs_trace.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index dfe7b0bcd97e..4d3fd00237b1 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -70,6 +70,8 @@ xfs_fsverity_get_descriptor(
};
int error = 0;
+ trace_xfs_fsverity_get_descriptor(ip);
+
/*
* The fact that (returned attribute size) == (provided buf_size) is
* checked by xfs_attr_copy_value() (returns -ERANGE). No descriptor
@@ -267,6 +269,8 @@ xfs_fsverity_read_merkle(
*/
xfs_fsverity_adjust_read(®ion);
+ trace_xfs_fsverity_read_merkle(XFS_I(inode), region.pos, region.length);
+
folio = iomap_read_region(®ion);
if (IS_ERR(folio))
return ERR_PTR(-EIO);
@@ -297,6 +301,8 @@ xfs_fsverity_write_merkle(
.ops = &xfs_buffered_write_iomap_ops,
};
+ trace_xfs_fsverity_write_merkle(XFS_I(inode), region.pos, region.length);
+
if (region.pos + region.length > inode->i_sb->s_maxbytes)
return -EFBIG;
@@ -309,6 +315,8 @@ xfs_fsverity_file_corrupt(
loff_t pos,
size_t len)
{
+ trace_xfs_fsverity_file_corrupt(XFS_I(inode), pos, len);
+
xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 50034c059e8c..4477d5412e53 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -5979,6 +5979,52 @@ DEFINE_EVENT(xfs_freeblocks_resv_class, name, \
DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_reserved);
DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_enospc);
+TRACE_EVENT(xfs_fsverity_get_descriptor,
+ TP_PROTO(struct xfs_inode *ip),
+ TP_ARGS(ip),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino)
+);
+
+DECLARE_EVENT_CLASS(xfs_fsverity_class,
+ TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length),
+ TP_ARGS(ip, pos, length),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(u64, pos)
+ __field(unsigned int, length)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->pos = pos;
+ __entry->length = length;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx pos %llx length %x",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pos,
+ __entry->length)
+)
+
+#define DEFINE_FSVERITY_EVENT(name) \
+DEFINE_EVENT(xfs_fsverity_class, name, \
+ TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length), \
+ TP_ARGS(ip, pos, length))
+DEFINE_FSVERITY_EVENT(xfs_fsverity_read_merkle);
+DEFINE_FSVERITY_EVENT(xfs_fsverity_write_merkle);
+DEFINE_FSVERITY_EVENT(xfs_fsverity_file_corrupt);
+
#endif /* _TRACE_XFS_H */
#undef TRACE_INCLUDE_PATH
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH RFC 29/29] xfs: enable ro-compat fs-verity flag
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
` (27 preceding siblings ...)
2025-07-28 20:30 ` [PATCH RFC 28/29] xfs: add fsverity traces Andrey Albershteyn
@ 2025-07-28 20:30 ` Andrey Albershteyn
28 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-28 20:30 UTC (permalink / raw)
To: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
Cc: Andrey Albershteyn
From: Andrey Albershteyn <aalbersh@redhat.com>
Finalize fs-verity integration in XFS by making kernel fs-verity
aware with ro-compat flag.
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: add spaces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_format.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 1cd9106d852b..b61dccf8cc38 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -378,8 +378,9 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
- XFS_SB_FEAT_RO_COMPAT_REFLINK| \
- XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+ XFS_SB_FEAT_RO_COMPAT_REFLINK | \
+ XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+ XFS_SB_FEAT_RO_COMPAT_VERITY)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
--
2.50.0
^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files
2025-07-28 20:30 ` [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
@ 2025-07-29 9:53 ` Amir Goldstein
2025-07-29 10:35 ` Andrey Albershteyn
2025-08-12 7:51 ` Christoph Hellwig
1 sibling, 1 reply; 75+ messages in thread
From: Amir Goldstein @ 2025-07-29 9:53 UTC (permalink / raw)
To: Andrey Albershteyn, Christian Brauner
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:31 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
>
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
> enabled.
Oh man! Please don't refer to this as an "extended attribute".
I was quite surprised to see actually how many times the term
"extended attribute" was used in commit messages in your series
that Linus just merged including 4 such references in the Kernel-doc
comments of security_inode_file_[sg]etattr(). :-/
The terminology used in Documentation/filesystem/vfs.rst and fileattr.h
are some permutations of "miscellaneous file flags and attributes".
Not perfect, but less confusing than "extended attributes", which are
famously known as something else.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> [djwong: fix broken verity flag checks]
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> Documentation/filesystems/fsverity.rst | 8 ++++++++
> fs/ioctl.c | 11 +++++++++++
> include/uapi/linux/fs.h | 1 +
> 3 files changed, 20 insertions(+)
>
> diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
> index dacdbc1149e6..33b588c32ed1 100644
> --- a/Documentation/filesystems/fsverity.rst
> +++ b/Documentation/filesystems/fsverity.rst
> @@ -342,6 +342,14 @@ the file has fs-verity enabled. This can perform better than
> FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
> opening the file, and opening verity files can be expensive.
>
> +FS_IOC_FSGETXATTR
> +-----------------
> +
> +Since Linux v6.17, the FS_IOC_FSGETXATTR ioctl sets FS_XFLAG_VERITY (0x00020000)
> +in the returned flags when the file has verity enabled. Note that this attribute
> +cannot be set with FS_IOC_FSSETXATTR as enabling verity requires input
> +parameters. See FS_IOC_ENABLE_VERITY.
> +
> .. _accessing_verity_files:
>
> Accessing verity files
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 69107a245b4c..6b94da2b93f5 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -480,6 +480,8 @@ void fileattr_fill_xflags(struct fileattr *fa, u32 xflags)
> fa->flags |= FS_DAX_FL;
> if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
> fa->flags |= FS_PROJINHERIT_FL;
> + if (fa->fsx_xflags & FS_XFLAG_VERITY)
> + fa->flags |= FS_VERITY_FL;
> }
> EXPORT_SYMBOL(fileattr_fill_xflags);
>
> @@ -510,6 +512,8 @@ void fileattr_fill_flags(struct fileattr *fa, u32 flags)
> fa->fsx_xflags |= FS_XFLAG_DAX;
> if (fa->flags & FS_PROJINHERIT_FL)
> fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
> + if (fa->flags & FS_VERITY_FL)
> + fa->fsx_xflags |= FS_XFLAG_VERITY;
> }
> EXPORT_SYMBOL(fileattr_fill_flags);
>
I think you should add it to FS_COMMON_FL/FS_XFLAG_COMMON?
And I guess also to FS_XFLAGS_MASK/FS_XFLAG_RDONLY_MASK
after rebasing to master.
> @@ -640,6 +644,13 @@ static int fileattr_set_prepare(struct inode *inode,
> !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> return -EINVAL;
>
> + /*
> + * Verity cannot be changed through FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS.
> + * See FS_IOC_ENABLE_VERITY.
> + */
> + if ((fa->fsx_xflags ^ old_ma->fsx_xflags) & FS_XFLAG_VERITY)
> + return -EINVAL;
> +
I think that after rebase, if you add the flag to FS_XFLAG_RDONLY_MASK
This check will fail, so can either remove this check and ignore user trying to
set FS_XFLAG_VERITY, same as if user was trying to set FS_XFLAG_HASATTR.
or, as I think Darrick may have suggested during review, we remove the masking
of FS_XFLAG_RDONLY_MASK in the fill helpers and we make this check more
generic here:
+ if ((fa->fsx_xflags ^ old_ma->fsx_xflags) & FS_XFLAG_RDONLY_MASK)
+ return -EINVAL;
Thanks,
Amir.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files
2025-07-29 9:53 ` Amir Goldstein
@ 2025-07-29 10:35 ` Andrey Albershteyn
2025-07-29 12:06 ` Amir Goldstein
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-29 10:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, fsverity, linux-fsdevel, linux-xfs, david,
djwong, ebiggers, hch, Andrey Albershteyn
On 2025-07-29 11:53:09, Amir Goldstein wrote:
> On Mon, Jul 28, 2025 at 10:31 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> >
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
> > enabled.
>
> Oh man! Please don't refer to this as an "extended attribute".
>
> I was quite surprised to see actually how many times the term
> "extended attribute" was used in commit messages in your series
> that Linus just merged including 4 such references in the Kernel-doc
> comments of security_inode_file_[sg]etattr(). :-/
You can comment on this, I'm fine with fixing these, I definitely
don't have all the context to know the best suitable terms.
>
> The terminology used in Documentation/filesystem/vfs.rst and fileattr.h
> are some permutations of "miscellaneous file flags and attributes".
> Not perfect, but less confusing than "extended attributes", which are
> famously known as something else.
Yeah sorry, it's very difficult to find out what is named how and
what is outdated.
I will update commit messages to "file flags".
>
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > [djwong: fix broken verity flag checks]
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > Documentation/filesystems/fsverity.rst | 8 ++++++++
> > fs/ioctl.c | 11 +++++++++++
> > include/uapi/linux/fs.h | 1 +
> > 3 files changed, 20 insertions(+)
> >
> > diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
> > index dacdbc1149e6..33b588c32ed1 100644
> > --- a/Documentation/filesystems/fsverity.rst
> > +++ b/Documentation/filesystems/fsverity.rst
> > @@ -342,6 +342,14 @@ the file has fs-verity enabled. This can perform better than
> > FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
> > opening the file, and opening verity files can be expensive.
> >
> > +FS_IOC_FSGETXATTR
> > +-----------------
> > +
> > +Since Linux v6.17, the FS_IOC_FSGETXATTR ioctl sets FS_XFLAG_VERITY (0x00020000)
> > +in the returned flags when the file has verity enabled. Note that this attribute
> > +cannot be set with FS_IOC_FSSETXATTR as enabling verity requires input
> > +parameters. See FS_IOC_ENABLE_VERITY.
> > +
> > .. _accessing_verity_files:
> >
> > Accessing verity files
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index 69107a245b4c..6b94da2b93f5 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -480,6 +480,8 @@ void fileattr_fill_xflags(struct fileattr *fa, u32 xflags)
> > fa->flags |= FS_DAX_FL;
> > if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
> > fa->flags |= FS_PROJINHERIT_FL;
> > + if (fa->fsx_xflags & FS_XFLAG_VERITY)
> > + fa->flags |= FS_VERITY_FL;
> > }
> > EXPORT_SYMBOL(fileattr_fill_xflags);
> >
> > @@ -510,6 +512,8 @@ void fileattr_fill_flags(struct fileattr *fa, u32 flags)
> > fa->fsx_xflags |= FS_XFLAG_DAX;
> > if (fa->flags & FS_PROJINHERIT_FL)
> > fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
> > + if (fa->flags & FS_VERITY_FL)
> > + fa->fsx_xflags |= FS_XFLAG_VERITY;
> > }
> > EXPORT_SYMBOL(fileattr_fill_flags);
> >
>
> I think you should add it to FS_COMMON_FL/FS_XFLAG_COMMON?
>
> And I guess also to FS_XFLAGS_MASK/FS_XFLAG_RDONLY_MASK
> after rebasing to master.
>
> > @@ -640,6 +644,13 @@ static int fileattr_set_prepare(struct inode *inode,
> > !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> > return -EINVAL;
> >
> > + /*
> > + * Verity cannot be changed through FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS.
> > + * See FS_IOC_ENABLE_VERITY.
> > + */
> > + if ((fa->fsx_xflags ^ old_ma->fsx_xflags) & FS_XFLAG_VERITY)
> > + return -EINVAL;
> > +
>
> I think that after rebase, if you add the flag to FS_XFLAG_RDONLY_MASK
> This check will fail, so can either remove this check and ignore user trying to
> set FS_XFLAG_VERITY, same as if user was trying to set FS_XFLAG_HASATTR.
> or, as I think Darrick may have suggested during review, we remove the masking
> of FS_XFLAG_RDONLY_MASK in the fill helpers and we make this check more
> generic here:
>
> + if ((fa->fsx_xflags ^ old_ma->fsx_xflags) & FS_XFLAG_RDONLY_MASK)
> + return -EINVAL;
Sure, will look into that in the next revision, I haven't used the
latest master, so this wasn't updated
>
> Thanks,
> Amir.
>
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files
2025-07-29 10:35 ` Andrey Albershteyn
@ 2025-07-29 12:06 ` Amir Goldstein
0 siblings, 0 replies; 75+ messages in thread
From: Amir Goldstein @ 2025-07-29 12:06 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christian Brauner, fsverity, linux-fsdevel, linux-xfs, david,
djwong, ebiggers, hch, Andrey Albershteyn
On Tue, Jul 29, 2025 at 12:35 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
>
> On 2025-07-29 11:53:09, Amir Goldstein wrote:
> > On Mon, Jul 28, 2025 at 10:31 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > >
> > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > >
> > > Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
> > > enabled.
> >
> > Oh man! Please don't refer to this as an "extended attribute".
> >
> > I was quite surprised to see actually how many times the term
> > "extended attribute" was used in commit messages in your series
> > that Linus just merged including 4 such references in the Kernel-doc
> > comments of security_inode_file_[sg]etattr(). :-/
>
> You can comment on this, I'm fine with fixing these, I definitely
> don't have all the context to know the best suitable terms.
>
> >
> > The terminology used in Documentation/filesystem/vfs.rst and fileattr.h
> > are some permutations of "miscellaneous file flags and attributes".
> > Not perfect, but less confusing than "extended attributes", which are
> > famously known as something else.
>
> Yeah sorry, it's very difficult to find out what is named how and
> what is outdated.
indeed.
>
> I will update commit messages to "file flags".
>
"file attributes" is better fit IMO considering the method name fileatte_*
and structs file_attr file_kattr and the comments in file_attr.c/fileatttr.h.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck
2025-07-28 20:30 ` [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck Andrey Albershteyn
@ 2025-07-29 15:28 ` Darrick J. Wong
2025-07-31 14:54 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 15:28 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:30PM +0200, Andrey Albershteyn wrote:
> The quotacheck doesn't initialize sc->ip.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Looks good,
Cc: <stable@vger.kernel.org> # v6.8
Fixes: 21d7500929c8a0 ("xfs: improve dquot iteration for scrub")
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/scrub/trace.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index d7c4ced47c15..4368f08e91c6 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -479,7 +479,7 @@ DECLARE_EVENT_CLASS(xchk_dqiter_class,
> __field(xfs_exntst_t, state)
> ),
> TP_fast_assign(
> - __entry->dev = cursor->sc->ip->i_mount->m_super->s_dev;
> + __entry->dev = cursor->sc->mp->m_super->s_dev;
> __entry->dqtype = cursor->dqtype;
> __entry->ino = cursor->quota_ip->i_ino;
> __entry->cur_id = cursor->id;
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF
2025-07-28 20:30 ` [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF Andrey Albershteyn
@ 2025-07-29 22:07 ` Darrick J. Wong
2025-07-31 15:04 ` Andrey Albershteyn
2025-07-31 18:43 ` Joanne Koong
1 sibling, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 22:07 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:05PM +0200, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Add iomap_writepages_unbound() without limit in form of EOF. XFS
> will use this to write metadata (fs-verity Merkle tree) in range far
> beyond EOF.
...and I guess some day fscrypt might use it to encrypt merkle trees
too?
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/iomap/buffered-io.c | 51 +++++++++++++++++++++++++++++++++++++++-----------
> include/linux/iomap.h | 3 +++
> 2 files changed, 43 insertions(+), 11 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3729391a18f3..7bef232254a3 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1881,18 +1881,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> int error = 0;
> u32 rlen;
>
> - WARN_ON_ONCE(!folio_test_locked(folio));
> - WARN_ON_ONCE(folio_test_dirty(folio));
> - WARN_ON_ONCE(folio_test_writeback(folio));
> -
> - trace_iomap_writepage(inode, pos, folio_size(folio));
> -
> - if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> - folio_unlock(folio);
> - return 0;
> - }
> WARN_ON_ONCE(end_pos <= pos);
>
> + trace_iomap_writepage(inode, pos, folio_size(folio));
> +
> if (i_blocks_per_folio(inode, folio) > 1) {
> if (!ifs) {
> ifs = ifs_alloc(inode, folio, 0);
> @@ -1956,6 +1948,23 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> return error;
> }
>
> +/* Map pages bound by EOF */
> +static int iomap_writepage_map_eof(struct iomap_writepage_ctx *wpc,
iomap_writepage_map_within_eof ?
> + struct writeback_control *wbc, struct folio *folio)
> +{
> + int error;
> + struct inode *inode = folio->mapping->host;
> + u64 end_pos = folio_pos(folio) + folio_size(folio);
> +
> + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> + folio_unlock(folio);
> + return 0;
> + }
> +
> + error = iomap_writepage_map(wpc, wbc, folio);
> + return error;
> +}
> +
> int
> iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> struct iomap_writepage_ctx *wpc,
> @@ -1972,9 +1981,29 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> PF_MEMALLOC))
> return -EIO;
>
> + wpc->ops = ops;
> + while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> + WARN_ON_ONCE(!folio_test_locked(folio));
> + WARN_ON_ONCE(folio_test_dirty(folio));
> + WARN_ON_ONCE(folio_test_writeback(folio));
> +
> + error = iomap_writepage_map_eof(wpc, wbc, folio);
> + }
> + return iomap_submit_ioend(wpc, error);
> +}
> +EXPORT_SYMBOL_GPL(iomap_writepages);
> +
> +int
> +iomap_writepages_unbound(struct address_space *mapping, struct writeback_control *wbc,
Might want to leave a comment here explaining what's the difference
between the two iomap_writepages exports:
/*
* Write dirty pages, including any beyond EOF. This is solely for use
* by files that allow post-EOF pagecache, which means fsverity.
*/
> + struct iomap_writepage_ctx *wpc,
> + const struct iomap_writeback_ops *ops)
> +{
> + struct folio *folio = NULL;
> + int error;
> +
...and you might want a:
WARN_ON(!IS_VERITY(wpc->inode));
to keep it that way.
--D
> wpc->ops = ops;
> while ((folio = writeback_iter(mapping, wbc, folio, &error)))
> error = iomap_writepage_map(wpc, wbc, folio);
> return iomap_submit_ioend(wpc, error);
> }
> -EXPORT_SYMBOL_GPL(iomap_writepages);
> +EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 522644d62f30..4a0b5ebb79e9 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -464,6 +464,9 @@ void iomap_sort_ioends(struct list_head *ioend_list);
> int iomap_writepages(struct address_space *mapping,
> struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> const struct iomap_writeback_ops *ops);
> +int iomap_writepages_unbound(struct address_space *mapping,
> + struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> + const struct iomap_writeback_ops *ops);
>
> /*
> * Flags for direct I/O ->end_io:
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface
2025-07-28 20:30 ` [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface Andrey Albershteyn
@ 2025-07-29 22:22 ` Darrick J. Wong
2025-07-31 15:51 ` Andrey Albershteyn
2025-08-11 11:43 ` Christoph Hellwig
0 siblings, 2 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 22:22 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:06PM +0200, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Interface for writing data beyond EOF into offsetted region in
> page cache.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> include/linux/iomap.h | 16 ++++++++
> fs/iomap/buffered-io.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 114 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 4a0b5ebb79e9..73288f28543f 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -83,6 +83,11 @@ struct vm_fault;
> */
> #define IOMAP_F_PRIVATE (1U << 12)
>
> +/*
> + * Writes happens beyound inode EOF
> + */
> +#define IOMAP_F_BEYOND_EOF (1U << 13)
> +
> /*
> * Flags set by the core iomap code during operations:
> *
> @@ -533,4 +538,15 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
>
> extern struct bio_set iomap_ioend_bioset;
>
> +struct ioregion {
> + struct inode *inode;
> + loff_t pos; /* IO position */
> + const void *buf; /* Data to be written (in only) */
> + size_t length; /* Length of the date */
Length of the data ?
> + const struct iomap_ops *ops;
> +};
This sounds like a kiocb and a kvec...
> +
> +struct folio *iomap_read_region(struct ioregion *region);
> +int iomap_write_region(struct ioregion *region);
...and these sound a lot like filemap_read and iomap_write_iter.
Why not use those? You'd get readahead for free. Though I guess
filemap_read cuts off at i_size so maybe that's why this is necessary?
(and by extension, is this why the existing fsverity implementations
seem to do their own readahead and reading?)
((and now I guess I see why this isn't done through the regular kiocb
interface, because then we'd be exposing post-EOF data hiding to
everyone in the system))
> #endif /* LINUX_IOMAP_H */
>
> --
> 2.50.0
>
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7bef232254a3..e959a206cba9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -321,6 +321,7 @@ struct iomap_readpage_ctx {
> bool cur_folio_in_bio;
> struct bio *bio;
> struct readahead_control *rac;
> + int flags;
What flags go in here?
> };
>
> /**
> @@ -387,7 +388,8 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> if (plen == 0)
> goto done;
>
> - if (iomap_block_needs_zeroing(iter, pos)) {
> + if (iomap_block_needs_zeroing(iter, pos) &&
> + !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> folio_zero_range(folio, poff, plen);
> iomap_set_range_uptodate(folio, poff, plen);
> goto done;
> @@ -2007,3 +2009,98 @@ iomap_writepages_unbound(struct address_space *mapping, struct writeback_control
> return iomap_submit_ioend(wpc, error);
> }
> EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
> +
> +struct folio *
> +iomap_read_region(struct ioregion *region)
> +{
> + struct inode *inode = region->inode;
> + fgf_t fgp = FGP_CREAT | FGP_LOCK | fgf_set_order(region->length);
> + pgoff_t index = (region->pos) >> PAGE_SHIFT;
> + struct folio *folio = __filemap_get_folio(inode->i_mapping, index, fgp,
> + mapping_gfp_mask(inode->i_mapping));
> + int ret;
> + struct iomap_iter iter = {
> + .inode = folio->mapping->host,
> + .pos = region->pos,
> + .len = region->length,
> + };
> + struct iomap_readpage_ctx ctx = {
> + .cur_folio = folio,
> + };
> +
> + if (folio_test_uptodate(folio)) {
> + folio_unlock(folio);
> + return folio;
> + }
> +
> + while ((ret = iomap_iter(&iter, region->ops)) > 0)
> + iter.status = iomap_read_folio_iter(&iter, &ctx);
Huh, we don't read into region->buf? Oh, I see, this gets iomap to
install an uptodate folio in the pagecache, and then later we can
just hand it to fsverity. Maybe?
--D
> +
> + if (ctx.bio) {
> + submit_bio(ctx.bio);
> + WARN_ON_ONCE(!ctx.cur_folio_in_bio);
> + } else {
> + WARN_ON_ONCE(ctx.cur_folio_in_bio);
> + folio_unlock(folio);
> + }
> +
> + return folio;
> +}
> +EXPORT_SYMBOL_GPL(iomap_read_region);
> +
> +static int iomap_write_region_iter(struct iomap_iter *iter, const void *buf)
> +{
> + loff_t pos = iter->pos;
> + u64 bytes = iomap_length(iter);
> + int status;
> +
> + do {
> + struct folio *folio;
> + size_t offset;
> + bool ret;
Is balance_dirty_pages_ratelimited_flags need here if we're at the dirty
thresholds?
> +
> + bytes = min_t(u64, SIZE_MAX, bytes);
> + status = iomap_write_begin(iter, &folio, &offset, &bytes);
> + if (status)
> + return status;
> + if (iter->iomap.flags & IOMAP_F_STALE)
> + break;
> +
> + offset = offset_in_folio(folio, pos);
> + if (bytes > folio_size(folio) - offset)
> + bytes = folio_size(folio) - offset;
> +
> + memcpy_to_folio(folio, offset, buf, bytes);
> +
> + ret = iomap_write_end(iter, bytes, bytes, folio);
> + if (WARN_ON_ONCE(!ret))
> + return -EIO;
> +
> + __iomap_put_folio(iter, bytes, folio);
> + if (WARN_ON_ONCE(!ret))
> + return -EIO;
> +
> + status = iomap_iter_advance(iter, &bytes);
> + if (status)
> + break;
> + } while (bytes > 0);
> +
> + return status;
> +}
Hrm, stripped down version of iomap_write_iter without the isize
updates.
--D
> +
> +int
> +iomap_write_region(struct ioregion *region)
> +{
> + struct iomap_iter iter = {
> + .inode = region->inode,
> + .pos = region->pos,
> + .len = region->length,
> + };
> + ssize_t ret;
> +
> + while ((ret = iomap_iter(&iter, region->ops)) > 0)
> + iter.status = iomap_write_region_iter(&iter, region->buf);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(iomap_write_region);
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes
2025-07-28 20:30 ` [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
@ 2025-07-29 22:27 ` Darrick J. Wong
2025-07-31 11:42 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 22:27 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:24PM +0200, Andrey Albershteyn wrote:
> While writing Merkle tree, file is read-only and there's no further
> writes except Merkle tree building. The file is truncated beforehand to
> remove any preallocated extents.
>
> The Merkle tree is the only data XFS will write. As we don't want XFS to
> truncate file after we done writing, let's also skip truncation on
> fsverity files. Therefore, we also need to disable preallocations while
> writing merkle tree as we don't want any unused extents past the tree.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/xfs_iomap.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index ff05e6b1b0bb..00ec1a738b39 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -32,6 +32,8 @@
> #include "xfs_rtbitmap.h"
> #include "xfs_icache.h"
> #include "xfs_zone_alloc.h"
> +#include "xfs_fsverity.h"
> +#include <linux/fsverity.h>
What do these includes pull in for the iflags tests below?
> #define XFS_ALLOC_ALIGN(mp, off) \
> (((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
> @@ -1849,7 +1851,9 @@ xfs_buffered_write_iomap_begin(
> * Determine the initial size of the preallocation.
> * We clean up any extra preallocation when the file is closed.
> */
> - if (xfs_has_allocsize(mp))
> + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
> + prealloc_blocks = 0;
> + else if (xfs_has_allocsize(mp))
> prealloc_blocks = mp->m_allocsize_blocks;
> else if (allocfork == XFS_DATA_FORK)
> prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
> @@ -1976,6 +1980,13 @@ xfs_buffered_write_iomap_end(
> if (flags & IOMAP_FAULT)
> return 0;
>
> + /*
> + * While writing Merkle tree to disk we would not have any other
> + * delayed allocations
> + */
> + if (xfs_iflags_test(XFS_I(inode), XFS_VERITY_CONSTRUCTION))
> + return 0;
I assume XFS_VERITY_CONSTRUCTION doesn't get set until after we've
locked the inode, flushed the dirty pagecache, and truncated the file to
EOF? In which case I guess this is ok -- we're never going to have new
delalloc reservations, and verity data can't be straddling the EOF
folio, no matter how large it is. Right?
--D
> +
> /* Nothing to do if we've written the entire delalloc extent */
> start_byte = iomap_last_written_block(inode, offset, written);
> end_byte = round_up(offset + length, i_blocksize(inode));
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages
2025-07-28 20:30 ` [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages Andrey Albershteyn
@ 2025-07-29 22:33 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 22:33 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
> Subject: [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages
s/Merkel/Merkle/
On Mon, Jul 28, 2025 at 10:30:25PM +0200, Andrey Albershteyn wrote:
> In the writeback path use unbound write interface, meaning that inode
> size is not updated and none of the file size checks are applied.
>
> In read path let iomap know that data is stored beyond EOF via flag.
> This leads to skipping of post EOF zeroing.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/xfs_aops.c | 21 ++++++++++++++-------
> fs/xfs/xfs_iomap.c | 9 +++++++--
> 2 files changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 63151feb9c3f..02e2c04b36c1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -22,6 +22,7 @@
> #include "xfs_icache.h"
> #include "xfs_zone_alloc.h"
> #include "xfs_rtgroup.h"
> +#include "xfs_fsverity.h"
>
> struct xfs_writepage_ctx {
> struct iomap_writepage_ctx ctx;
> @@ -628,10 +629,12 @@ static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
>
> STATIC int
> xfs_vm_writepages(
> - struct address_space *mapping,
> - struct writeback_control *wbc)
> + struct address_space *mapping,
> + struct writeback_control *wbc)
> {
> - struct xfs_inode *ip = XFS_I(mapping->host);
> + struct xfs_inode *ip = XFS_I(mapping->host);
> + struct xfs_writepage_ctx wpc = { };
> +
>
> xfs_iflags_clear(ip, XFS_ITRUNCATED);
>
> @@ -644,12 +647,16 @@ xfs_vm_writepages(
> if (xc.open_zone)
> xfs_open_zone_put(xc.open_zone);
> return error;
> - } else {
> - struct xfs_writepage_ctx wpc = { };
> + }
>
> - return iomap_writepages(mapping, wbc, &wpc.ctx,
> - &xfs_writeback_ops);
> + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION)) {
> + wbc->range_start = XFS_FSVERITY_MTREE_OFFSET;
Where is XFS_FSVERITY_MTREE_OFFSET defined?
(Oh, the next patch)
Do you need to update wbc->nr if you change range_start?
--D
> + wbc->range_end = LLONG_MAX;
> + return iomap_writepages_unbound(mapping, wbc, &wpc.ctx,
> + &xfs_writeback_ops);
> }
> +
> + return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> }
>
> STATIC int
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 00ec1a738b39..c8725508165c 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -2031,6 +2031,7 @@ xfs_read_iomap_begin(
> bool shared = false;
> unsigned int lockmode = XFS_ILOCK_SHARED;
> u64 seq;
> + int iomap_flags;
>
> ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
>
> @@ -2050,8 +2051,12 @@ xfs_read_iomap_begin(
> if (error)
> return error;
> trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> - return xfs_bmbt_to_iomap(ip, iomap, &imap, flags,
> - shared ? IOMAP_F_SHARED : 0, seq);
> + iomap_flags = shared ? IOMAP_F_SHARED : 0;
> +
> + if (fsverity_active(inode) && offset >= XFS_FSVERITY_MTREE_OFFSET)
> + iomap_flags |= IOMAP_F_BEYOND_EOF;
> +
> + return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
> }
>
> const struct iomap_ops xfs_read_iomap_ops = {
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 22/29] xfs: add fs-verity support
2025-07-28 20:30 ` [PATCH RFC 22/29] xfs: add fs-verity support Andrey Albershteyn
@ 2025-07-29 23:05 ` Darrick J. Wong
2025-07-31 14:50 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 23:05 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:26PM +0200, Andrey Albershteyn wrote:
> Add integration with fs-verity. XFS stores fs-verity descriptor in
> the extended file attributes and the Merkle tree is store in data fork
> beyond EOF.
>
> The descriptor is stored under "vdesc" extended attribute.
>
> The Merkle tree reading/writing is done through iomap interface. The
> data itself are at offset 1 << 53 in the inode's page cache. When XFS
> reads from this region iomap doesn't call into fsverity to verify it
> against Merkle tree. For data, verification is done on BIO completion.
>
> When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> flag is set meaning that the Merkle tree is being build. The
> initialization ends with storing of verity descriptor and setting
> inode on-disk flag (XFS_DIFLAG2_VERITY).
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
/me wonders if this is really SoB me since it's rather different from
the last time I even touched this patch. It also means I can't RVB it.
;)
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/xfs/Makefile | 1 +
> fs/xfs/libxfs/xfs_da_format.h | 4 +
> fs/xfs/xfs_bmap_util.c | 7 +
> fs/xfs/xfs_fsverity.c | 311 ++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_fsverity.h | 28 ++++
> fs/xfs/xfs_inode.h | 6 +
> fs/xfs/xfs_super.c | 20 +++
> 7 files changed, 377 insertions(+)
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 5bf501cf8271..ad66439db7bf 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -147,6 +147,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
> xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
> xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
> xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
> +xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
>
> # notify failure
> ifeq ($(CONFIG_MEMORY_FAILURE),y)
> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> index e5274be2fe9c..b17fdbbb48aa 100644
> --- a/fs/xfs/libxfs/xfs_da_format.h
> +++ b/fs/xfs/libxfs/xfs_da_format.h
> @@ -909,4 +909,8 @@ struct xfs_parent_rec {
> __be32 p_gen;
> } __packed;
>
> +/* fs-verity ondisk xattr name used for the fsverity descriptor */
> +#define XFS_VERITY_DESCRIPTOR_NAME "vdesc"
> +#define XFS_VERITY_DESCRIPTOR_NAME_LEN (sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
> +
> #endif /* __XFS_DA_FORMAT_H__ */
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 06ca11731e43..af1933129647 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -31,6 +31,7 @@
> #include "xfs_rtbitmap.h"
> #include "xfs_rtgroup.h"
> #include "xfs_zone_alloc.h"
> +#include <linux/fsverity.h>
>
> /* Kernel only BMAP related definitions and functions */
>
> @@ -553,6 +554,12 @@ xfs_can_free_eofblocks(
> if (last_fsb <= end_fsb)
> return false;
>
> + /*
> + * Nothing to clean on fsverity inodes as they are read-only
> + */
> + if (IS_VERITY(VFS_I(ip)))
> + return false;
> +
> /*
> * Check if there is an post-EOF extent to free. If there are any
> * delalloc blocks attached to the inode (data fork delalloc
> diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> new file mode 100644
> index 000000000000..ec7dea4289d5
> --- /dev/null
> +++ b/fs/xfs/xfs_fsverity.c
> @@ -0,0 +1,311 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 Red Hat, Inc.
> + */
> +#include "xfs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_inode.h"
> +#include "xfs_log_format.h"
> +#include "xfs_attr.h"
> +#include "xfs_bmap_util.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_attr_leaf.h"
> +#include "xfs_trace.h"
> +#include "xfs_quota.h"
> +#include "xfs_ag.h"
> +#include "xfs_fsverity.h"
> +#include "xfs_iomap.h"
> +#include <linux/fsverity.h>
> +#include <linux/pagemap.h>
> +
> +/*
> + * Initialize an args structure to load or store the fsverity descriptor.
> + * Caller must ensure @args is zeroed except for value and valuelen.
> + */
> +static inline void
> +xfs_fsverity_init_vdesc_args(
> + struct xfs_inode *ip,
> + struct xfs_da_args *args)
> +{
> + args->geo = ip->i_mount->m_attr_geo;
> + args->whichfork = XFS_ATTR_FORK;
> + args->attr_filter = XFS_ATTR_VERITY;
> + args->op_flags = XFS_DA_OP_OKNOENT;
> + args->dp = ip;
> + args->owner = ip->i_ino;
> + args->name = XFS_VERITY_DESCRIPTOR_NAME;
> + args->namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN;
> + xfs_attr_sethash(args);
> +}
> +
> +/* Delete the verity descriptor. */
> +static int
> +xfs_fsverity_delete_descriptor(
> + struct xfs_inode *ip)
> +{
> + struct xfs_da_args args = { };
> +
> + xfs_fsverity_init_vdesc_args(ip, &args);
> + return xfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
> +}
> +
> +/* Retrieve the verity descriptor. */
> +static int
> +xfs_fsverity_get_descriptor(
> + struct inode *inode,
> + void *buf,
> + size_t buf_size)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_da_args args = {
> + .value = buf,
> + .valuelen = buf_size,
> + };
> + int error = 0;
> +
> + /*
> + * The fact that (returned attribute size) == (provided buf_size) is
> + * checked by xfs_attr_copy_value() (returns -ERANGE). No descriptor
> + * is treated as a short read so that common fsverity code will
> + * complain.
> + */
> + xfs_fsverity_init_vdesc_args(ip, &args);
> + error = xfs_attr_get(&args);
> + if (error == -ENOATTR)
> + return 0;
> + if (error)
> + return error;
> +
> + return args.valuelen;
> +}
> +
> +/* Try to remove all the fsverity metadata after a failed enablement. */
> +static int
> +xfs_fsverity_delete_metadata(
> + struct xfs_inode *ip)
> +{
> + struct xfs_trans *tp;
> + struct xfs_mount *mp = ip->i_mount;
> + int error;
> +
> + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> + if (error) {
> + ASSERT(xfs_is_shutdown(mp));
> + return error;
> + }
> +
> + xfs_ilock(ip, XFS_ILOCK_EXCL);
> + xfs_trans_ijoin(tp, ip, 0);
> +
> + /*
> + * As only merkle tree is getting removed, no need to change inode size
> + */
> + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> + if (error)
> + goto err_cancel;
> +
> + error = xfs_trans_commit(tp);
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> + if (error)
> + return error;
> +
> + error = xfs_fsverity_delete_descriptor(ip);
> + return error != -ENOATTR ? error : 0;
Do you need to re-truncate the file to EOF to clear out the post-eof
blocks?
> +
> +err_cancel:
> + xfs_trans_cancel(tp);
> + return error;
> +}
> +
> +
> +/* Prepare to enable fsverity by clearing old metadata. */
> +static int
> +xfs_fsverity_begin_enable(
> + struct file *filp)
> +{
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + int error;
> +
> + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> +
> + if (IS_DAX(inode))
> + return -EINVAL;
> +
> + if (inode->i_size > XFS_FSVERITY_MTREE_OFFSET)
> + return -EFBIG;
> +
> + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> + return -EBUSY;
Hm. Do we actually flush the pagecache to disk in the _begin_enable
path? /me looked briefly but didn't see anything.
> +
> + error = xfs_qm_dqattach(ip);
> + if (error)
> + return error;
> +
> + return xfs_fsverity_delete_metadata(ip);
> +}
> +
> +/* Complete (or fail) the process of enabling fsverity. */
> +static int
> +xfs_fsverity_end_enable(
> + struct file *filp,
> + const void *desc,
> + size_t desc_size,
> + u64 merkle_tree_size)
> +{
> + struct xfs_da_args args = {
> + .value = (void *)desc,
> + .valuelen = desc_size,
> + };
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_trans *tp;
> + int error = 0;
> +
> + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> +
> + /* fs-verity failed, just cleanup */
> + if (desc == NULL)
> + goto out;
> +
> + xfs_fsverity_init_vdesc_args(ip, &args);
> + error = xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
> + if (error)
> + goto out;
> +
> + /*
> + * Wait for Merkel tree get written to disk before setting on-disk inode
> + * flag and clering XFS_VERITY_CONSTRUCTION
s/Merkel/Merkle/
s/clering/clearing/ , sorry if that's my typo
> + */
> + error = filemap_write_and_wait(inode->i_mapping);
> + if (error)
> + return error;
> +
> + /* Set fsverity inode flag */
> + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> + 0, 0, false, &tp);
> + if (error)
> + goto out;
> +
> + /*
> + * Ensure that we've persisted the verity information before we enable
> + * it on the inode and tell the caller we have sealed the inode.
> + */
> + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> +
> + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> + xfs_trans_set_sync(tp);
> +
> + error = xfs_trans_commit(tp);
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> + if (!error)
> + inode->i_flags |= S_VERITY;
> +
> +out:
> + if (error) {
> + int error2;
> +
> + error2 = xfs_fsverity_delete_metadata(ip);
> + if (error2)
> + xfs_alert(ip->i_mount,
> +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> + ip->i_ino, error2);
> + }
> +
> + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> + return error;
> +}
> +
> +static void
> +xfs_fsverity_adjust_read(
> + struct ioregion *region)
> +{
> + u8 log_blocksize;
> + unsigned int block_size;
> + u64 tree_size;
> + u64 position = region->pos & XFS_FSVERITY_MTREE_MASK;
> +
> + fsverity_merkle_tree_geometry(region->inode, &log_blocksize,
> + &block_size, &tree_size);
> +
> + if (position + region->length < tree_size)
> + return;
> +
> + region->length = tree_size - position;
> +}
> +
> +/* Retrieve a merkle tree block. */
> +static struct page *
> +xfs_fsverity_read_merkle(
> + struct inode *inode,
> + pgoff_t index,
> + unsigned long num_ra_pages)
> +{
> + u64 position = (index << PAGE_SHIFT) | XFS_FSVERITY_MTREE_OFFSET;
Is this shift safe on 32-bit where pgoff_t is unsigned long?
I don't think it is....?
> + struct ioregion region = {
> + .inode = inode,
> + .pos = position,
> + .length = PAGE_SIZE,
> + .ops = &xfs_read_iomap_ops,
> + };
> + int error;
> + struct folio *folio;
> +
> + /*
> + * As region->length is PAGE_SIZE we have to adjust the length for the
> + * end of the tree. The case when tree blocks size are smaller then
> + * PAGE_SIZE.
Hrm. I wonder if we want to try to create large folios for merkle
trees?
Since the comment references PAGE_SIZE, what happens if this is an 8k
fsblock XFS filesystem where folios will always be at least 8k?
> + */
> + xfs_fsverity_adjust_read(®ion);
> +
> + folio = iomap_read_region(®ion);
> + if (IS_ERR(folio))
> + return ERR_PTR(-EIO);
> +
> + /* Wait for buffered read to finish */
> + error = folio_wait_locked_killable(folio);
> + if (error)
> + return ERR_PTR(error);
> + if (IS_ERR(folio) || !folio_test_uptodate(folio))
> + return ERR_PTR(-EFSCORRUPTED);
> +
> + return folio_file_page(folio, 0);
> +}
> +
> +/* Write a merkle tree block. */
> +static int
> +xfs_fsverity_write_merkle(
> + struct inode *inode,
> + const void *buf,
> + u64 pos,
> + unsigned int size)
> +{
> + struct ioregion region = {
> + .inode = inode,
> + .pos = pos | XFS_FSVERITY_MTREE_OFFSET,
> + .buf = buf,
> + .length = size,
> + .ops = &xfs_buffered_write_iomap_ops,
> + };
> +
> + if (region.pos + region.length > inode->i_sb->s_maxbytes)
> + return -EFBIG;
> +
> + return iomap_write_region(®ion);
> +}
> +
> +const struct fsverity_operations xfs_fsverity_ops = {
> + .begin_enable_verity = xfs_fsverity_begin_enable,
> + .end_enable_verity = xfs_fsverity_end_enable,
> + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> +};
> diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> new file mode 100644
> index 000000000000..e063b7288dc0
> --- /dev/null
> +++ b/fs/xfs/xfs_fsverity.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2022 Red Hat, Inc.
> + */
> +#ifndef __XFS_FSVERITY_H__
> +#define __XFS_FSVERITY_H__
> +
> +/* Merkle tree location in page cache. We take memory region from the inode's
> + * address space for Merkle tree.
> + *
> + * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
Aren't sha512 hashes allowed too? In which case it would be 64 bytes
per hash?
I suppose that doesn't really matter since... an 8EB file on a 1k
fsblock filesystem with 1k merkle tree blocks will do this:
# xfs_db /dev/sda -c "btheight -b 1024 -n $(( (2**63 - 1) / 1024 ))
64:64:0:merkle -w min"
64:64:0:merkle: worst case per 1024-byte block: 8 records (leaf) / 8
keyptrs (node)
level 0: 9007199254740991 records, 1125899906842624 blocks
level 1: 1125899906842624 records, 140737488355328 blocks
level 2: 140737488355328 records, 17592186044416 blocks
level 3: 17592186044416 records, 2199023255552 blocks
level 4: 2199023255552 records, 274877906944 blocks
level 5: 274877906944 records, 34359738368 blocks
level 6: 34359738368 records, 4294967296 blocks
level 7: 4294967296 records, 536870912 blocks
level 8: 536870912 records, 67108864 blocks
level 9: 67108864 records, 8388608 blocks
level 10: 8388608 records, 1048576 blocks
level 11: 1048576 records, 131072 blocks
level 12: 131072 records, 16384 blocks
level 13: 16384 records, 2048 blocks
level 14: 2048 records, 256 blocks
level 15: 256 records, 32 blocks
level 16: 32 records, 4 blocks
level 17: 4 records, 1 block
18 levels, 1286742750677285 blocks total
log2(1286742750677285) is 50, so your choice of 53 is fine. Though this
will far exceed the max merkle tree height anyway.
> + * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
> + * bits address space.
> + *
> + * At this Merkle tree size we can cover 295EB large file. This is much larger
> + * than the currently supported file size.
> + *
> + * As we allocate some pagecache space for fsverity tree we need to limit
> + * maximum fsverity file size.
> + */
> +#define XFS_FSVERITY_MTREE_OFFSET (1ULL << 53)
> +#define XFS_FSVERITY_MTREE_MASK (XFS_FSVERITY_MTREE_OFFSET - 1)
This is part of the ondisk format, so it belongs in xfs_fs.h.
--D
> +#ifdef CONFIG_FS_VERITY
> +extern const struct fsverity_operations xfs_fsverity_ops;
> +#endif /* CONFIG_FS_VERITY */
> +
> +#endif /* __XFS_FSVERITY_H__ */
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index d7e2b902ef5c..033e1a71e64b 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -404,6 +404,12 @@ static inline bool xfs_inode_can_hw_atomic_write(const struct xfs_inode *ip)
> */
> #define XFS_IREMAPPING (1U << 15)
>
> +/*
> + * fs-verity's Merkle tree is under construction. The file is read-only, the
> + * only writes happening is the ones with Merkle tree blocks.
> + */
> +#define XFS_VERITY_CONSTRUCTION (1U << 16)
> +
> /* All inode state flags related to inode reclaim. */
> #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> XFS_IRECLAIM | \
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 7b1ace75955c..10fb23ac672a 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -30,6 +30,7 @@
> #include "xfs_filestream.h"
> #include "xfs_quota.h"
> #include "xfs_sysfs.h"
> +#include "xfs_fsverity.h"
> #include "xfs_ondisk.h"
> #include "xfs_rmap_item.h"
> #include "xfs_refcount_item.h"
> @@ -54,6 +55,7 @@
> #include <linux/fs_context.h>
> #include <linux/fs_parser.h>
> #include <linux/fsverity.h>
> +#include <linux/iomap.h>
>
> static const struct super_operations xfs_super_operations;
>
> @@ -1711,6 +1713,9 @@ xfs_fs_fill_super(
> sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> #endif
> sb->s_op = &xfs_super_operations;
> +#ifdef CONFIG_FS_VERITY
> + sb->s_vop = &xfs_fsverity_ops;
> +#endif
>
> /*
> * Delay mount work if the debug hook is set. This is debug
> @@ -1964,10 +1969,25 @@ xfs_fs_fill_super(
> xfs_set_resuming_quotaon(mp);
> mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
>
> + if (xfs_has_verity(mp))
> + xfs_warn(mp,
> + "EXPERIMENTAL fsverity feature in use. Use at your own risk!");
> +
> error = xfs_mountfs(mp);
> if (error)
> goto out_filestream_unmount;
>
> +#ifdef CONFIG_FS_VERITY
> + /*
> + * Don't use a high priority workqueue like the other fsverity
> + * implementations because that will lead to conflicts with the xfs log
> + * workqueue.
> + */
> + error = iomap_init_fsverity(mp->m_super, 0, 0);
> + if (error)
> + goto out_unmount;
> +#endif
> +
> root = igrab(VFS_I(mp->m_rootip));
> if (!root) {
> error = -ENOENT;
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 28/29] xfs: add fsverity traces
2025-07-28 20:30 ` [PATCH RFC 28/29] xfs: add fsverity traces Andrey Albershteyn
@ 2025-07-29 23:06 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 23:06 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:32PM +0200, Andrey Albershteyn wrote:
> Even though fsverity has traces, debugging issues with varying block
> sizes could a bit less transparent without read/write traces.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Looks fine to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_fsverity.c | 8 ++++++++
> fs/xfs/xfs_trace.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> index dfe7b0bcd97e..4d3fd00237b1 100644
> --- a/fs/xfs/xfs_fsverity.c
> +++ b/fs/xfs/xfs_fsverity.c
> @@ -70,6 +70,8 @@ xfs_fsverity_get_descriptor(
> };
> int error = 0;
>
> + trace_xfs_fsverity_get_descriptor(ip);
> +
> /*
> * The fact that (returned attribute size) == (provided buf_size) is
> * checked by xfs_attr_copy_value() (returns -ERANGE). No descriptor
> @@ -267,6 +269,8 @@ xfs_fsverity_read_merkle(
> */
> xfs_fsverity_adjust_read(®ion);
>
> + trace_xfs_fsverity_read_merkle(XFS_I(inode), region.pos, region.length);
> +
> folio = iomap_read_region(®ion);
> if (IS_ERR(folio))
> return ERR_PTR(-EIO);
> @@ -297,6 +301,8 @@ xfs_fsverity_write_merkle(
> .ops = &xfs_buffered_write_iomap_ops,
> };
>
> + trace_xfs_fsverity_write_merkle(XFS_I(inode), region.pos, region.length);
> +
> if (region.pos + region.length > inode->i_sb->s_maxbytes)
> return -EFBIG;
>
> @@ -309,6 +315,8 @@ xfs_fsverity_file_corrupt(
> loff_t pos,
> size_t len)
> {
> + trace_xfs_fsverity_file_corrupt(XFS_I(inode), pos, len);
> +
> xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
> }
>
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 50034c059e8c..4477d5412e53 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -5979,6 +5979,52 @@ DEFINE_EVENT(xfs_freeblocks_resv_class, name, \
> DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_reserved);
> DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_enospc);
>
> +TRACE_EVENT(xfs_fsverity_get_descriptor,
> + TP_PROTO(struct xfs_inode *ip),
> + TP_ARGS(ip),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_ino_t, ino)
> + ),
> + TP_fast_assign(
> + __entry->dev = VFS_I(ip)->i_sb->s_dev;
> + __entry->ino = ip->i_ino;
> + ),
> + TP_printk("dev %d:%d ino 0x%llx",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->ino)
> +);
> +
> +DECLARE_EVENT_CLASS(xfs_fsverity_class,
> + TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length),
> + TP_ARGS(ip, pos, length),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_ino_t, ino)
> + __field(u64, pos)
> + __field(unsigned int, length)
> + ),
> + TP_fast_assign(
> + __entry->dev = VFS_I(ip)->i_sb->s_dev;
> + __entry->ino = ip->i_ino;
> + __entry->pos = pos;
> + __entry->length = length;
> + ),
> + TP_printk("dev %d:%d ino 0x%llx pos %llx length %x",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->ino,
> + __entry->pos,
> + __entry->length)
> +)
> +
> +#define DEFINE_FSVERITY_EVENT(name) \
> +DEFINE_EVENT(xfs_fsverity_class, name, \
> + TP_PROTO(struct xfs_inode *ip, u64 pos, unsigned int length), \
> + TP_ARGS(ip, pos, length))
> +DEFINE_FSVERITY_EVENT(xfs_fsverity_read_merkle);
> +DEFINE_FSVERITY_EVENT(xfs_fsverity_write_merkle);
> +DEFINE_FSVERITY_EVENT(xfs_fsverity_file_corrupt);
> +
> #endif /* _TRACE_XFS_H */
>
> #undef TRACE_INCLUDE_PATH
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path
2025-07-28 20:30 ` [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
@ 2025-07-29 23:21 ` Darrick J. Wong
2025-07-31 11:34 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-29 23:21 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:17PM +0200, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> This patch adds fs-verity verification into iomap's read path. After
> BIO's io operation is complete the data are verified against
> fs-verity's Merkle tree. Verification work is done in a separate
> workqueue.
>
> The read path ioend iomap_read_ioend are stored side by side with
> BIOs if FS_VERITY is enabled.
>
> [djwong: fix doc warning]
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/iomap/buffered-io.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++--
> fs/iomap/ioend.c | 41 +++++++++++++-
> include/linux/iomap.h | 13 +++++
> 3 files changed, 198 insertions(+), 7 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e959a206cba9..87c974e543e0 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -6,6 +6,7 @@
> #include <linux/module.h>
> #include <linux/compiler.h>
> #include <linux/fs.h>
> +#include <linux/fsverity.h>
> #include <linux/iomap.h>
> #include <linux/pagemap.h>
> #include <linux/uio.h>
> @@ -363,6 +364,116 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
> pos >= i_size_read(iter->inode);
> }
>
> +#ifdef CONFIG_FS_VERITY
> +int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
> + int max_active)
> +{
> + int ret;
> +
> + if (!iomap_fsverity_bioset) {
> + ret = iomap_fsverity_init_bioset();
> + if (ret)
> + return ret;
> + }
> +
> + return fsverity_init_wq(sb, wq_flags, max_active);
> +}
> +EXPORT_SYMBOL_GPL(iomap_init_fsverity);
> +
> +static void
> +iomap_read_fsverify_end_io_work(struct work_struct *work)
> +{
> + struct iomap_fsverity_bio *fbio =
> + container_of(work, struct iomap_fsverity_bio, work);
> +
> + fsverity_verify_bio(&fbio->bio);
> + iomap_read_end_io(&fbio->bio);
> +}
> +
> +static void
> +iomap_read_fsverity_end_io(struct bio *bio)
> +{
> + struct iomap_fsverity_bio *fbio =
> + container_of(bio, struct iomap_fsverity_bio, bio);
> +
> + INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
> + queue_work(bio->bi_private, &fbio->work);
> +}
> +
> +static struct bio *
> +iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
> + int nr_vecs, gfp_t gfp)
> +{
> + struct bio *bio;
> +
> + bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> + iomap_fsverity_bioset);
> + if (bio) {
> + bio->bi_private = inode->i_sb->s_verity_wq;
> + bio->bi_end_io = iomap_read_fsverity_end_io;
> + }
> + return bio;
> +}
> +
> +/*
> + * True if tree is not aligned with fs block/folio size and we need zero tail
> + * part of the folio
> + */
> +static bool
> +iomap_fsverity_tree_end_align(struct iomap_iter *iter, struct folio *folio,
> + loff_t pos, size_t plen)
> +{
> + int error;
> + u8 log_blocksize;
> + u64 tree_size, tree_mask, last_block_tree, last_block_pos;
> +
> + /* Not a Merkle tree */
> + if (!(iter->iomap.flags & IOMAP_F_BEYOND_EOF))
> + return false;
> +
> + if (plen == folio_size(folio))
> + return false;
> +
> + if (iter->inode->i_blkbits == folio_shift(folio))
> + return false;
> +
> + error = fsverity_merkle_tree_geometry(iter->inode, &log_blocksize, NULL,
> + &tree_size);
> + if (error)
> + return false;
> +
> + /*
> + * We are beyond EOF reading Merkle tree. Therefore, it has highest
> + * offset. Mask pos with a tree size to get a position whare are we in
> + * the tree. Then, compare index of a last tree block and the index of
> + * current pos block.
> + */
> + last_block_tree = (tree_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + tree_mask = (1 << fls64(tree_size)) - 1;
> + last_block_pos = ((pos & tree_mask) >> PAGE_SHIFT) + 1;
> +
> + return last_block_tree == last_block_pos;
> +}
> +#else
> +# define iomap_fsverity_read_bio_alloc(...) (NULL)
> +# define iomap_fsverity_tree_end_align(...) (false)
> +#endif /* CONFIG_FS_VERITY */
> +
> +static struct bio *iomap_read_bio_alloc(struct inode *inode,
> + const struct iomap *iomap, int nr_vecs, gfp_t gfp)
> +{
> + struct bio *bio;
> + struct block_device *bdev = iomap->bdev;
> +
> + if (fsverity_active(inode) && !(iomap->flags & IOMAP_F_BEYOND_EOF))
> + return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
> +
> + bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
> + if (bio)
> + bio->bi_end_io = iomap_read_end_io;
> + return bio;
> +}
> +
> static int iomap_readpage_iter(struct iomap_iter *iter,
> struct iomap_readpage_ctx *ctx)
> {
> @@ -375,6 +486,10 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> sector_t sector;
> int ret;
>
> + /* Fail reads from broken fsverity files immediately. */
> + if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
> + return -EIO;
> +
> if (iomap->type == IOMAP_INLINE) {
> ret = iomap_read_inline_data(iter, folio);
> if (ret)
> @@ -391,6 +506,11 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> if (iomap_block_needs_zeroing(iter, pos) &&
> !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> folio_zero_range(folio, poff, plen);
> + if (fsverity_active(iter->inode) &&
> + !fsverity_verify_blocks(folio, plen, poff)) {
> + return -EIO;
> + }
> +
> iomap_set_range_uptodate(folio, poff, plen);
> goto done;
> }
> @@ -408,32 +528,51 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> !bio_add_folio(ctx->bio, folio, plen, poff)) {
> gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> gfp_t orig_gfp = gfp;
> - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
>
> if (ctx->bio)
> submit_bio(ctx->bio);
>
> if (ctx->rac) /* same as readahead_gfp_mask */
> gfp |= __GFP_NORETRY | __GFP_NOWARN;
> - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> - REQ_OP_READ, gfp);
> +
> + ctx->bio = iomap_read_bio_alloc(iter->inode, iomap,
> + bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
> + gfp);
> +
> /*
> * If the bio_alloc fails, try it again for a single page to
> * avoid having to deal with partial page reads. This emulates
> * what do_mpage_read_folio does.
> */
> if (!ctx->bio) {
> - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> - orig_gfp);
> + ctx->bio = iomap_read_bio_alloc(iter->inode,
> + iomap, 1, orig_gfp);
> }
> if (ctx->rac)
> ctx->bio->bi_opf |= REQ_RAHEAD;
> ctx->bio->bi_iter.bi_sector = sector;
> - ctx->bio->bi_end_io = iomap_read_end_io;
> bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> }
>
> done:
> + /*
> + * For post EOF region, zero part of the folio which won't be read. This
> + * happens at the end of the region. So far, the only user is
> + * fs-verity which stores continuous data region.
Is it ever the case that the zeroed region actually has merkle tree
content on disk? Or if this region truly was never written by the
fsverity construction code, then why would it access the unwritten
region later?
Or am I misunderstanding something here?
(Probably...)
--D
> + * We also check the fs block size as plen could be smaller than the
> + * block size. If we zero part of the block and mark the whole block
> + * (iomap_set_range_uptodate() works with fsblocks) as uptodate the
> + * iomap_finish_folio_read() will toggle the uptodate bit when the folio
> + * is read.
> + */
> + if (iomap_fsverity_tree_end_align(iter, folio, pos, plen)) {
> + folio_zero_range(folio, poff + plen,
> + folio_size(folio) - (poff + plen));
> + iomap_set_range_uptodate(folio, poff + plen,
> + folio_size(folio) - (poff + plen));
> + }
> +
> /*
> * Move the caller beyond our range so that it keeps making progress.
> * For that, we have to include any leading non-uptodate ranges, but
> diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> index 18894ebba6db..400751d82313 100644
> --- a/fs/iomap/ioend.c
> +++ b/fs/iomap/ioend.c
> @@ -6,6 +6,8 @@
> #include <linux/list_sort.h>
> #include "internal.h"
>
> +#define IOMAP_POOL_SIZE (4 * (PAGE_SIZE / SECTOR_SIZE))
> +
> struct bio_set iomap_ioend_bioset;
> EXPORT_SYMBOL_GPL(iomap_ioend_bioset);
>
> @@ -207,9 +209,46 @@ struct iomap_ioend *iomap_split_ioend(struct iomap_ioend *ioend,
> }
> EXPORT_SYMBOL_GPL(iomap_split_ioend);
>
> +#ifdef CONFIG_FS_VERITY
> +struct bio_set *iomap_fsverity_bioset;
> +EXPORT_SYMBOL_GPL(iomap_fsverity_bioset);
> +int iomap_fsverity_init_bioset(void)
> +{
> + struct bio_set *bs, *old;
> + int error;
> +
> + bs = kzalloc(sizeof(*bs), GFP_KERNEL);
> + if (!bs)
> + return -ENOMEM;
> +
> + error = bioset_init(bs, IOMAP_POOL_SIZE,
> + offsetof(struct iomap_fsverity_bio, bio),
> + BIOSET_NEED_BVECS);
> + if (error) {
> + kfree(bs);
> + return error;
> + }
> +
> + /*
> + * This has to be atomic as readaheads can race to create the
> + * bioset. If someone set the pointer before us, we drop ours.
> + */
> + old = cmpxchg(&iomap_fsverity_bioset, NULL, bs);
> + if (old) {
> + bioset_exit(bs);
> + kfree(bs);
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(iomap_fsverity_init_bioset);
> +#else
> +# define iomap_fsverity_init_bioset(...) (-EOPNOTSUPP)
> +#endif
> +
> static int __init iomap_ioend_init(void)
> {
> - return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
> + return bioset_init(&iomap_ioend_bioset, IOMAP_POOL_SIZE,
> offsetof(struct iomap_ioend, io_bio),
> BIOSET_NEED_BVECS);
> }
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 73288f28543f..f845876ad8d2 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -339,6 +339,19 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
> iter->srcmap.type == IOMAP_MAPPED;
> }
>
> +#ifdef CONFIG_FS_VERITY
> +extern struct bio_set *iomap_fsverity_bioset;
> +
> +struct iomap_fsverity_bio {
> + struct work_struct work;
> + struct bio bio;
> +};
> +
> +int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
> + int max_active);
> +int iomap_fsverity_init_bioset(void);
> +#endif
> +
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> const struct iomap_ops *ops, void *private);
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path
2025-07-29 23:21 ` Darrick J. Wong
@ 2025-07-31 11:34 ` Andrey Albershteyn
2025-07-31 14:52 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 11:34 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-29 16:21:52, Darrick J. Wong wrote:
> On Mon, Jul 28, 2025 at 10:30:17PM +0200, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > This patch adds fs-verity verification into iomap's read path. After
> > BIO's io operation is complete the data are verified against
> > fs-verity's Merkle tree. Verification work is done in a separate
> > workqueue.
> >
> > The read path ioend iomap_read_ioend are stored side by side with
> > BIOs if FS_VERITY is enabled.
> >
> > [djwong: fix doc warning]
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/iomap/buffered-io.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++--
> > fs/iomap/ioend.c | 41 +++++++++++++-
> > include/linux/iomap.h | 13 +++++
> > 3 files changed, 198 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index e959a206cba9..87c974e543e0 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -6,6 +6,7 @@
> > #include <linux/module.h>
> > #include <linux/compiler.h>
> > #include <linux/fs.h>
> > +#include <linux/fsverity.h>
> > #include <linux/iomap.h>
> > #include <linux/pagemap.h>
> > #include <linux/uio.h>
> > @@ -363,6 +364,116 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
> > pos >= i_size_read(iter->inode);
> > }
> >
> > +#ifdef CONFIG_FS_VERITY
> > +int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
> > + int max_active)
> > +{
> > + int ret;
> > +
> > + if (!iomap_fsverity_bioset) {
> > + ret = iomap_fsverity_init_bioset();
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + return fsverity_init_wq(sb, wq_flags, max_active);
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_init_fsverity);
> > +
> > +static void
> > +iomap_read_fsverify_end_io_work(struct work_struct *work)
> > +{
> > + struct iomap_fsverity_bio *fbio =
> > + container_of(work, struct iomap_fsverity_bio, work);
> > +
> > + fsverity_verify_bio(&fbio->bio);
> > + iomap_read_end_io(&fbio->bio);
> > +}
> > +
> > +static void
> > +iomap_read_fsverity_end_io(struct bio *bio)
> > +{
> > + struct iomap_fsverity_bio *fbio =
> > + container_of(bio, struct iomap_fsverity_bio, bio);
> > +
> > + INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
> > + queue_work(bio->bi_private, &fbio->work);
> > +}
> > +
> > +static struct bio *
> > +iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
> > + int nr_vecs, gfp_t gfp)
> > +{
> > + struct bio *bio;
> > +
> > + bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> > + iomap_fsverity_bioset);
> > + if (bio) {
> > + bio->bi_private = inode->i_sb->s_verity_wq;
> > + bio->bi_end_io = iomap_read_fsverity_end_io;
> > + }
> > + return bio;
> > +}
> > +
> > +/*
> > + * True if tree is not aligned with fs block/folio size and we need zero tail
> > + * part of the folio
> > + */
> > +static bool
> > +iomap_fsverity_tree_end_align(struct iomap_iter *iter, struct folio *folio,
> > + loff_t pos, size_t plen)
> > +{
> > + int error;
> > + u8 log_blocksize;
> > + u64 tree_size, tree_mask, last_block_tree, last_block_pos;
> > +
> > + /* Not a Merkle tree */
> > + if (!(iter->iomap.flags & IOMAP_F_BEYOND_EOF))
> > + return false;
> > +
> > + if (plen == folio_size(folio))
> > + return false;
> > +
> > + if (iter->inode->i_blkbits == folio_shift(folio))
> > + return false;
> > +
> > + error = fsverity_merkle_tree_geometry(iter->inode, &log_blocksize, NULL,
> > + &tree_size);
> > + if (error)
> > + return false;
> > +
> > + /*
> > + * We are beyond EOF reading Merkle tree. Therefore, it has highest
> > + * offset. Mask pos with a tree size to get a position whare are we in
> > + * the tree. Then, compare index of a last tree block and the index of
> > + * current pos block.
> > + */
> > + last_block_tree = (tree_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + tree_mask = (1 << fls64(tree_size)) - 1;
> > + last_block_pos = ((pos & tree_mask) >> PAGE_SHIFT) + 1;
> > +
> > + return last_block_tree == last_block_pos;
> > +}
> > +#else
> > +# define iomap_fsverity_read_bio_alloc(...) (NULL)
> > +# define iomap_fsverity_tree_end_align(...) (false)
> > +#endif /* CONFIG_FS_VERITY */
> > +
> > +static struct bio *iomap_read_bio_alloc(struct inode *inode,
> > + const struct iomap *iomap, int nr_vecs, gfp_t gfp)
> > +{
> > + struct bio *bio;
> > + struct block_device *bdev = iomap->bdev;
> > +
> > + if (fsverity_active(inode) && !(iomap->flags & IOMAP_F_BEYOND_EOF))
> > + return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
> > +
> > + bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
> > + if (bio)
> > + bio->bi_end_io = iomap_read_end_io;
> > + return bio;
> > +}
> > +
> > static int iomap_readpage_iter(struct iomap_iter *iter,
> > struct iomap_readpage_ctx *ctx)
> > {
> > @@ -375,6 +486,10 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > sector_t sector;
> > int ret;
> >
> > + /* Fail reads from broken fsverity files immediately. */
> > + if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
> > + return -EIO;
> > +
> > if (iomap->type == IOMAP_INLINE) {
> > ret = iomap_read_inline_data(iter, folio);
> > if (ret)
> > @@ -391,6 +506,11 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > if (iomap_block_needs_zeroing(iter, pos) &&
> > !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> > folio_zero_range(folio, poff, plen);
> > + if (fsverity_active(iter->inode) &&
> > + !fsverity_verify_blocks(folio, plen, poff)) {
> > + return -EIO;
> > + }
> > +
> > iomap_set_range_uptodate(folio, poff, plen);
> > goto done;
> > }
> > @@ -408,32 +528,51 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > !bio_add_folio(ctx->bio, folio, plen, poff)) {
> > gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> > gfp_t orig_gfp = gfp;
> > - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> >
> > if (ctx->bio)
> > submit_bio(ctx->bio);
> >
> > if (ctx->rac) /* same as readahead_gfp_mask */
> > gfp |= __GFP_NORETRY | __GFP_NOWARN;
> > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > - REQ_OP_READ, gfp);
> > +
> > + ctx->bio = iomap_read_bio_alloc(iter->inode, iomap,
> > + bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
> > + gfp);
> > +
> > /*
> > * If the bio_alloc fails, try it again for a single page to
> > * avoid having to deal with partial page reads. This emulates
> > * what do_mpage_read_folio does.
> > */
> > if (!ctx->bio) {
> > - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> > - orig_gfp);
> > + ctx->bio = iomap_read_bio_alloc(iter->inode,
> > + iomap, 1, orig_gfp);
> > }
> > if (ctx->rac)
> > ctx->bio->bi_opf |= REQ_RAHEAD;
> > ctx->bio->bi_iter.bi_sector = sector;
> > - ctx->bio->bi_end_io = iomap_read_end_io;
> > bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> > }
> >
> > done:
> > + /*
> > + * For post EOF region, zero part of the folio which won't be read. This
> > + * happens at the end of the region. So far, the only user is
> > + * fs-verity which stores continuous data region.
>
> Is it ever the case that the zeroed region actually has merkle tree
> content on disk? Or if this region truly was never written by the
> fsverity construction code, then why would it access the unwritten
> region later?
>
> Or am I misunderstanding something here?
>
> (Probably...)
The zeroed region is never written. With 1k fs block and 1k merkle
tree block and 4k page size we could end up reading only single
block, at the end of the tree. But we have to pass PAGE to the
fsverity. So, only the 1/4 of the page is read. This if-case zeroes
the rest of the page to make it uptodate. In the normal read path
this is bound by EOF, but here I use tree size. So, we don't read
this unwritten region but zero the folio.
The fsverity does zeroing of unused space while construction, but
this works only for full fs blocks, therefore, 4k fs block and 1k
merkle tree block.
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes
2025-07-29 22:27 ` Darrick J. Wong
@ 2025-07-31 11:42 ` Andrey Albershteyn
2025-07-31 14:49 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 11:42 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-29 15:27:36, Darrick J. Wong wrote:
> On Mon, Jul 28, 2025 at 10:30:24PM +0200, Andrey Albershteyn wrote:
> > While writing Merkle tree, file is read-only and there's no further
> > writes except Merkle tree building. The file is truncated beforehand to
> > remove any preallocated extents.
> >
> > The Merkle tree is the only data XFS will write. As we don't want XFS to
> > truncate file after we done writing, let's also skip truncation on
> > fsverity files. Therefore, we also need to disable preallocations while
> > writing merkle tree as we don't want any unused extents past the tree.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/xfs/xfs_iomap.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index ff05e6b1b0bb..00ec1a738b39 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -32,6 +32,8 @@
> > #include "xfs_rtbitmap.h"
> > #include "xfs_icache.h"
> > #include "xfs_zone_alloc.h"
> > +#include "xfs_fsverity.h"
> > +#include <linux/fsverity.h>
>
> What do these includes pull in for the iflags tests below?
Probably need to be removed, thanks for noting
>
> > #define XFS_ALLOC_ALIGN(mp, off) \
> > (((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
> > @@ -1849,7 +1851,9 @@ xfs_buffered_write_iomap_begin(
> > * Determine the initial size of the preallocation.
> > * We clean up any extra preallocation when the file is closed.
> > */
> > - if (xfs_has_allocsize(mp))
> > + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
> > + prealloc_blocks = 0;
> > + else if (xfs_has_allocsize(mp))
> > prealloc_blocks = mp->m_allocsize_blocks;
> > else if (allocfork == XFS_DATA_FORK)
> > prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
> > @@ -1976,6 +1980,13 @@ xfs_buffered_write_iomap_end(
> > if (flags & IOMAP_FAULT)
> > return 0;
> >
> > + /*
> > + * While writing Merkle tree to disk we would not have any other
> > + * delayed allocations
> > + */
> > + if (xfs_iflags_test(XFS_I(inode), XFS_VERITY_CONSTRUCTION))
> > + return 0;
>
> I assume XFS_VERITY_CONSTRUCTION doesn't get set until after we've
> locked the inode, flushed the dirty pagecache, and truncated the file to
> EOF? In which case I guess this is ok -- we're never going to have new
> delalloc reservations,
yes, this is my thinking here
> and verity data can't be straddling the EOF
> folio, no matter how large it is. Right?
Not sure, what you mean here. In page cache merkle tree is stored
at (1 << 53) offset, and there's check for file overlapping this in
patch 22 xfs_fsverity_begin_enable().
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes
2025-07-31 11:42 ` Andrey Albershteyn
@ 2025-07-31 14:49 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-31 14:49 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Thu, Jul 31, 2025 at 01:42:54PM +0200, Andrey Albershteyn wrote:
> On 2025-07-29 15:27:36, Darrick J. Wong wrote:
> > On Mon, Jul 28, 2025 at 10:30:24PM +0200, Andrey Albershteyn wrote:
> > > While writing Merkle tree, file is read-only and there's no further
> > > writes except Merkle tree building. The file is truncated beforehand to
> > > remove any preallocated extents.
> > >
> > > The Merkle tree is the only data XFS will write. As we don't want XFS to
> > > truncate file after we done writing, let's also skip truncation on
> > > fsverity files. Therefore, we also need to disable preallocations while
> > > writing merkle tree as we don't want any unused extents past the tree.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/xfs/xfs_iomap.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index ff05e6b1b0bb..00ec1a738b39 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -32,6 +32,8 @@
> > > #include "xfs_rtbitmap.h"
> > > #include "xfs_icache.h"
> > > #include "xfs_zone_alloc.h"
> > > +#include "xfs_fsverity.h"
> > > +#include <linux/fsverity.h>
> >
> > What do these includes pull in for the iflags tests below?
>
> Probably need to be removed, thanks for noting
>
> >
> > > #define XFS_ALLOC_ALIGN(mp, off) \
> > > (((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
> > > @@ -1849,7 +1851,9 @@ xfs_buffered_write_iomap_begin(
> > > * Determine the initial size of the preallocation.
> > > * We clean up any extra preallocation when the file is closed.
> > > */
> > > - if (xfs_has_allocsize(mp))
> > > + if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
> > > + prealloc_blocks = 0;
> > > + else if (xfs_has_allocsize(mp))
> > > prealloc_blocks = mp->m_allocsize_blocks;
> > > else if (allocfork == XFS_DATA_FORK)
> > > prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
> > > @@ -1976,6 +1980,13 @@ xfs_buffered_write_iomap_end(
> > > if (flags & IOMAP_FAULT)
> > > return 0;
> > >
> > > + /*
> > > + * While writing Merkle tree to disk we would not have any other
> > > + * delayed allocations
> > > + */
> > > + if (xfs_iflags_test(XFS_I(inode), XFS_VERITY_CONSTRUCTION))
> > > + return 0;
> >
> > I assume XFS_VERITY_CONSTRUCTION doesn't get set until after we've
> > locked the inode, flushed the dirty pagecache, and truncated the file to
> > EOF? In which case I guess this is ok -- we're never going to have new
> > delalloc reservations,
>
> yes, this is my thinking here
>
> > and verity data can't be straddling the EOF
> > folio, no matter how large it is. Right?
>
> Not sure, what you mean here. In page cache merkle tree is stored
> at (1 << 53) offset, and there's check for file overlapping this in
> patch 22 xfs_fsverity_begin_enable().
Yeah, I hadn't gotten to that patch yet. That part looks fine to me,
though I sorta wondered why not put it at offset 1<<62 to allow for even
bigger files. But the impression I've gotten from Eric is that they
don't really want to handle files that huge with merkle trees that big
so the loss of file range address space probably doesn't matter.
--D
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 22/29] xfs: add fs-verity support
2025-07-29 23:05 ` Darrick J. Wong
@ 2025-07-31 14:50 ` Andrey Albershteyn
2025-07-31 15:07 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 14:50 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-29 16:05:36, Darrick J. Wong wrote:
> On Mon, Jul 28, 2025 at 10:30:26PM +0200, Andrey Albershteyn wrote:
> > Add integration with fs-verity. XFS stores fs-verity descriptor in
> > the extended file attributes and the Merkle tree is store in data fork
> > beyond EOF.
> >
> > The descriptor is stored under "vdesc" extended attribute.
> >
> > The Merkle tree reading/writing is done through iomap interface. The
> > data itself are at offset 1 << 53 in the inode's page cache. When XFS
> > reads from this region iomap doesn't call into fsverity to verify it
> > against Merkle tree. For data, verification is done on BIO completion.
> >
> > When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> > flag is set meaning that the Merkle tree is being build. The
> > initialization ends with storing of verity descriptor and setting
> > inode on-disk flag (XFS_DIFLAG2_VERITY).
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>
> /me wonders if this is really SoB me since it's rather different from
> the last time I even touched this patch. It also means I can't RVB it.
> ;)
Yeah, it probably make sense to drop it, because I dropped most of
the blob cache related code
>
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/xfs/Makefile | 1 +
> > fs/xfs/libxfs/xfs_da_format.h | 4 +
> > fs/xfs/xfs_bmap_util.c | 7 +
> > fs/xfs/xfs_fsverity.c | 311 ++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/xfs_fsverity.h | 28 ++++
> > fs/xfs/xfs_inode.h | 6 +
> > fs/xfs/xfs_super.c | 20 +++
> > 7 files changed, 377 insertions(+)
> >
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 5bf501cf8271..ad66439db7bf 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -147,6 +147,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
> > xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
> > xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
> > xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
> > +xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
> >
> > # notify failure
> > ifeq ($(CONFIG_MEMORY_FAILURE),y)
> > diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> > index e5274be2fe9c..b17fdbbb48aa 100644
> > --- a/fs/xfs/libxfs/xfs_da_format.h
> > +++ b/fs/xfs/libxfs/xfs_da_format.h
> > @@ -909,4 +909,8 @@ struct xfs_parent_rec {
> > __be32 p_gen;
> > } __packed;
> >
> > +/* fs-verity ondisk xattr name used for the fsverity descriptor */
> > +#define XFS_VERITY_DESCRIPTOR_NAME "vdesc"
> > +#define XFS_VERITY_DESCRIPTOR_NAME_LEN (sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
> > +
> > #endif /* __XFS_DA_FORMAT_H__ */
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 06ca11731e43..af1933129647 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -31,6 +31,7 @@
> > #include "xfs_rtbitmap.h"
> > #include "xfs_rtgroup.h"
> > #include "xfs_zone_alloc.h"
> > +#include <linux/fsverity.h>
> >
> > /* Kernel only BMAP related definitions and functions */
> >
> > @@ -553,6 +554,12 @@ xfs_can_free_eofblocks(
> > if (last_fsb <= end_fsb)
> > return false;
> >
> > + /*
> > + * Nothing to clean on fsverity inodes as they are read-only
> > + */
> > + if (IS_VERITY(VFS_I(ip)))
> > + return false;
> > +
> > /*
> > * Check if there is an post-EOF extent to free. If there are any
> > * delalloc blocks attached to the inode (data fork delalloc
> > diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> > new file mode 100644
> > index 000000000000..ec7dea4289d5
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fsverity.c
> > @@ -0,0 +1,311 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2023 Red Hat, Inc.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_attr.h"
> > +#include "xfs_bmap_util.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_attr_leaf.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_quota.h"
> > +#include "xfs_ag.h"
> > +#include "xfs_fsverity.h"
> > +#include "xfs_iomap.h"
> > +#include <linux/fsverity.h>
> > +#include <linux/pagemap.h>
> > +
> > +/*
> > + * Initialize an args structure to load or store the fsverity descriptor.
> > + * Caller must ensure @args is zeroed except for value and valuelen.
> > + */
> > +static inline void
> > +xfs_fsverity_init_vdesc_args(
> > + struct xfs_inode *ip,
> > + struct xfs_da_args *args)
> > +{
> > + args->geo = ip->i_mount->m_attr_geo;
> > + args->whichfork = XFS_ATTR_FORK;
> > + args->attr_filter = XFS_ATTR_VERITY;
> > + args->op_flags = XFS_DA_OP_OKNOENT;
> > + args->dp = ip;
> > + args->owner = ip->i_ino;
> > + args->name = XFS_VERITY_DESCRIPTOR_NAME;
> > + args->namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN;
> > + xfs_attr_sethash(args);
> > +}
> > +
> > +/* Delete the verity descriptor. */
> > +static int
> > +xfs_fsverity_delete_descriptor(
> > + struct xfs_inode *ip)
> > +{
> > + struct xfs_da_args args = { };
> > +
> > + xfs_fsverity_init_vdesc_args(ip, &args);
> > + return xfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
> > +}
> > +
> > +/* Retrieve the verity descriptor. */
> > +static int
> > +xfs_fsverity_get_descriptor(
> > + struct inode *inode,
> > + void *buf,
> > + size_t buf_size)
> > +{
> > + struct xfs_inode *ip = XFS_I(inode);
> > + struct xfs_da_args args = {
> > + .value = buf,
> > + .valuelen = buf_size,
> > + };
> > + int error = 0;
> > +
> > + /*
> > + * The fact that (returned attribute size) == (provided buf_size) is
> > + * checked by xfs_attr_copy_value() (returns -ERANGE). No descriptor
> > + * is treated as a short read so that common fsverity code will
> > + * complain.
> > + */
> > + xfs_fsverity_init_vdesc_args(ip, &args);
> > + error = xfs_attr_get(&args);
> > + if (error == -ENOATTR)
> > + return 0;
> > + if (error)
> > + return error;
> > +
> > + return args.valuelen;
> > +}
> > +
> > +/* Try to remove all the fsverity metadata after a failed enablement. */
> > +static int
> > +xfs_fsverity_delete_metadata(
> > + struct xfs_inode *ip)
> > +{
> > + struct xfs_trans *tp;
> > + struct xfs_mount *mp = ip->i_mount;
> > + int error;
> > +
> > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> > + if (error) {
> > + ASSERT(xfs_is_shutdown(mp));
> > + return error;
> > + }
> > +
> > + xfs_ilock(ip, XFS_ILOCK_EXCL);
> > + xfs_trans_ijoin(tp, ip, 0);
> > +
> > + /*
> > + * As only merkle tree is getting removed, no need to change inode size
> > + */
> > + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> > + if (error)
> > + goto err_cancel;
> > +
> > + error = xfs_trans_commit(tp);
> > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > + if (error)
> > + return error;
> > +
> > + error = xfs_fsverity_delete_descriptor(ip);
> > + return error != -ENOATTR ? error : 0;
>
> Do you need to re-truncate the file to EOF to clear out the post-eof
> blocks?
yes, any prealloc extents, or failed merkle tree construction
extents
>
> > +
> > +err_cancel:
> > + xfs_trans_cancel(tp);
> > + return error;
> > +}
> > +
> > +
> > +/* Prepare to enable fsverity by clearing old metadata. */
> > +static int
> > +xfs_fsverity_begin_enable(
> > + struct file *filp)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + struct xfs_inode *ip = XFS_I(inode);
> > + int error;
> > +
> > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > +
> > + if (IS_DAX(inode))
> > + return -EINVAL;
> > +
> > + if (inode->i_size > XFS_FSVERITY_MTREE_OFFSET)
> > + return -EFBIG;
> > +
> > + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> > + return -EBUSY;
>
> Hm. Do we actually flush the pagecache to disk in the _begin_enable
> path? /me looked briefly but didn't see anything.
hmm, you're right, this is missing, my memory had an impression that
fsverity code does it but it doesn't
I'm not sure that this is actually needed here as we will do
filemap_write_and_wait() in _end_enable() while inode is being
locked and only writting happening is merkle tree construction.
>
> > +
> > + error = xfs_qm_dqattach(ip);
> > + if (error)
> > + return error;
> > +
> > + return xfs_fsverity_delete_metadata(ip);
> > +}
> > +
> > +/* Complete (or fail) the process of enabling fsverity. */
> > +static int
> > +xfs_fsverity_end_enable(
> > + struct file *filp,
> > + const void *desc,
> > + size_t desc_size,
> > + u64 merkle_tree_size)
> > +{
> > + struct xfs_da_args args = {
> > + .value = (void *)desc,
> > + .valuelen = desc_size,
> > + };
> > + struct inode *inode = file_inode(filp);
> > + struct xfs_inode *ip = XFS_I(inode);
> > + struct xfs_mount *mp = ip->i_mount;
> > + struct xfs_trans *tp;
> > + int error = 0;
> > +
> > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > +
> > + /* fs-verity failed, just cleanup */
> > + if (desc == NULL)
> > + goto out;
> > +
> > + xfs_fsverity_init_vdesc_args(ip, &args);
> > + error = xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
> > + if (error)
> > + goto out;
> > +
> > + /*
> > + * Wait for Merkel tree get written to disk before setting on-disk inode
> > + * flag and clering XFS_VERITY_CONSTRUCTION
>
> s/Merkel/Merkle/
>
> s/clering/clearing/ , sorry if that's my typo
thanks!
>
> > + */
> > + error = filemap_write_and_wait(inode->i_mapping);
> > + if (error)
> > + return error;
> > +
> > + /* Set fsverity inode flag */
> > + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> > + 0, 0, false, &tp);
> > + if (error)
> > + goto out;
> > +
> > + /*
> > + * Ensure that we've persisted the verity information before we enable
> > + * it on the inode and tell the caller we have sealed the inode.
> > + */
> > + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> > +
> > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > + xfs_trans_set_sync(tp);
> > +
> > + error = xfs_trans_commit(tp);
> > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +
> > + if (!error)
> > + inode->i_flags |= S_VERITY;
> > +
> > +out:
> > + if (error) {
> > + int error2;
> > +
> > + error2 = xfs_fsverity_delete_metadata(ip);
> > + if (error2)
> > + xfs_alert(ip->i_mount,
> > +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> > + ip->i_ino, error2);
> > + }
> > +
> > + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> > + return error;
> > +}
> > +
> > +static void
> > +xfs_fsverity_adjust_read(
> > + struct ioregion *region)
> > +{
> > + u8 log_blocksize;
> > + unsigned int block_size;
> > + u64 tree_size;
> > + u64 position = region->pos & XFS_FSVERITY_MTREE_MASK;
> > +
> > + fsverity_merkle_tree_geometry(region->inode, &log_blocksize,
> > + &block_size, &tree_size);
> > +
> > + if (position + region->length < tree_size)
> > + return;
> > +
> > + region->length = tree_size - position;
> > +}
> > +
> > +/* Retrieve a merkle tree block. */
> > +static struct page *
> > +xfs_fsverity_read_merkle(
> > + struct inode *inode,
> > + pgoff_t index,
> > + unsigned long num_ra_pages)
> > +{
> > + u64 position = (index << PAGE_SHIFT) | XFS_FSVERITY_MTREE_OFFSET;
>
> Is this shift safe on 32-bit where pgoff_t is unsigned long?
> I don't think it is....?
u64 position = ((loff_t)index << PAGE_SHIFT) | XFS_FSVERITY_MTREE_OFFSET;
like this?
>
> > + struct ioregion region = {
> > + .inode = inode,
> > + .pos = position,
> > + .length = PAGE_SIZE,
> > + .ops = &xfs_read_iomap_ops,
> > + };
> > + int error;
> > + struct folio *folio;
> > +
> > + /*
> > + * As region->length is PAGE_SIZE we have to adjust the length for the
> > + * end of the tree. The case when tree blocks size are smaller then
> > + * PAGE_SIZE.
>
> Hrm. I wonder if we want to try to create large folios for merkle
> trees?
I think we can as we just need to then pick the right page fsverity
requested
>
> Since the comment references PAGE_SIZE, what happens if this is an 8k
> fsblock XFS filesystem where folios will always be at least 8k?
I think this should be fine, the default is PAGE_SIZE because
fsverity is interested in pages, so, start from this and adjust if
we don't have data to fill it.
>
> > + */
> > + xfs_fsverity_adjust_read(®ion);
> > +
> > + folio = iomap_read_region(®ion);
> > + if (IS_ERR(folio))
> > + return ERR_PTR(-EIO);
> > +
> > + /* Wait for buffered read to finish */
> > + error = folio_wait_locked_killable(folio);
> > + if (error)
> > + return ERR_PTR(error);
> > + if (IS_ERR(folio) || !folio_test_uptodate(folio))
> > + return ERR_PTR(-EFSCORRUPTED);
> > +
> > + return folio_file_page(folio, 0);
> > +}
> > +
> > +/* Write a merkle tree block. */
> > +static int
> > +xfs_fsverity_write_merkle(
> > + struct inode *inode,
> > + const void *buf,
> > + u64 pos,
> > + unsigned int size)
> > +{
> > + struct ioregion region = {
> > + .inode = inode,
> > + .pos = pos | XFS_FSVERITY_MTREE_OFFSET,
> > + .buf = buf,
> > + .length = size,
> > + .ops = &xfs_buffered_write_iomap_ops,
> > + };
> > +
> > + if (region.pos + region.length > inode->i_sb->s_maxbytes)
> > + return -EFBIG;
> > +
> > + return iomap_write_region(®ion);
> > +}
> > +
> > +const struct fsverity_operations xfs_fsverity_ops = {
> > + .begin_enable_verity = xfs_fsverity_begin_enable,
> > + .end_enable_verity = xfs_fsverity_end_enable,
> > + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> > + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> > + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> > +};
> > diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> > new file mode 100644
> > index 000000000000..e063b7288dc0
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fsverity.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2022 Red Hat, Inc.
> > + */
> > +#ifndef __XFS_FSVERITY_H__
> > +#define __XFS_FSVERITY_H__
> > +
> > +/* Merkle tree location in page cache. We take memory region from the inode's
> > + * address space for Merkle tree.
> > + *
> > + * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
>
> Aren't sha512 hashes allowed too? In which case it would be 64 bytes
> per hash?
>
> I suppose that doesn't really matter since... an 8EB file on a 1k
> fsblock filesystem with 1k merkle tree blocks will do this:
>
> # xfs_db /dev/sda -c "btheight -b 1024 -n $(( (2**63 - 1) / 1024 ))
> 64:64:0:merkle -w min"
> 64:64:0:merkle: worst case per 1024-byte block: 8 records (leaf) / 8
> keyptrs (node)
> level 0: 9007199254740991 records, 1125899906842624 blocks
> level 1: 1125899906842624 records, 140737488355328 blocks
> level 2: 140737488355328 records, 17592186044416 blocks
> level 3: 17592186044416 records, 2199023255552 blocks
> level 4: 2199023255552 records, 274877906944 blocks
> level 5: 274877906944 records, 34359738368 blocks
> level 6: 34359738368 records, 4294967296 blocks
> level 7: 4294967296 records, 536870912 blocks
> level 8: 536870912 records, 67108864 blocks
> level 9: 67108864 records, 8388608 blocks
> level 10: 8388608 records, 1048576 blocks
> level 11: 1048576 records, 131072 blocks
> level 12: 131072 records, 16384 blocks
> level 13: 16384 records, 2048 blocks
> level 14: 2048 records, 256 blocks
> level 15: 256 records, 32 blocks
> level 16: 32 records, 4 blocks
> level 17: 4 records, 1 block
> 18 levels, 1286742750677285 blocks total
>
> log2(1286742750677285) is 50, so your choice of 53 is fine. Though this
> will far exceed the max merkle tree height anyway.
yup, the 53 seems to be fine even for 1k.
>
> > + * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
> > + * bits address space.
> > + *
> > + * At this Merkle tree size we can cover 295EB large file. This is much larger
> > + * than the currently supported file size.
> > + *
> > + * As we allocate some pagecache space for fsverity tree we need to limit
> > + * maximum fsverity file size.
> > + */
> > +#define XFS_FSVERITY_MTREE_OFFSET (1ULL << 53)
> > +#define XFS_FSVERITY_MTREE_MASK (XFS_FSVERITY_MTREE_OFFSET - 1)
>
> This is part of the ondisk format, so it belongs in xfs_fs.h.
oh, will move it
>
> --D
>
> > +#ifdef CONFIG_FS_VERITY
> > +extern const struct fsverity_operations xfs_fsverity_ops;
> > +#endif /* CONFIG_FS_VERITY */
> > +
> > +#endif /* __XFS_FSVERITY_H__ */
> > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > index d7e2b902ef5c..033e1a71e64b 100644
> > --- a/fs/xfs/xfs_inode.h
> > +++ b/fs/xfs/xfs_inode.h
> > @@ -404,6 +404,12 @@ static inline bool xfs_inode_can_hw_atomic_write(const struct xfs_inode *ip)
> > */
> > #define XFS_IREMAPPING (1U << 15)
> >
> > +/*
> > + * fs-verity's Merkle tree is under construction. The file is read-only, the
> > + * only writes happening is the ones with Merkle tree blocks.
> > + */
> > +#define XFS_VERITY_CONSTRUCTION (1U << 16)
> > +
> > /* All inode state flags related to inode reclaim. */
> > #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> > XFS_IRECLAIM | \
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 7b1ace75955c..10fb23ac672a 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -30,6 +30,7 @@
> > #include "xfs_filestream.h"
> > #include "xfs_quota.h"
> > #include "xfs_sysfs.h"
> > +#include "xfs_fsverity.h"
> > #include "xfs_ondisk.h"
> > #include "xfs_rmap_item.h"
> > #include "xfs_refcount_item.h"
> > @@ -54,6 +55,7 @@
> > #include <linux/fs_context.h>
> > #include <linux/fs_parser.h>
> > #include <linux/fsverity.h>
> > +#include <linux/iomap.h>
> >
> > static const struct super_operations xfs_super_operations;
> >
> > @@ -1711,6 +1713,9 @@ xfs_fs_fill_super(
> > sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> > #endif
> > sb->s_op = &xfs_super_operations;
> > +#ifdef CONFIG_FS_VERITY
> > + sb->s_vop = &xfs_fsverity_ops;
> > +#endif
> >
> > /*
> > * Delay mount work if the debug hook is set. This is debug
> > @@ -1964,10 +1969,25 @@ xfs_fs_fill_super(
> > xfs_set_resuming_quotaon(mp);
> > mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
> >
> > + if (xfs_has_verity(mp))
> > + xfs_warn(mp,
> > + "EXPERIMENTAL fsverity feature in use. Use at your own risk!");
> > +
> > error = xfs_mountfs(mp);
> > if (error)
> > goto out_filestream_unmount;
> >
> > +#ifdef CONFIG_FS_VERITY
> > + /*
> > + * Don't use a high priority workqueue like the other fsverity
> > + * implementations because that will lead to conflicts with the xfs log
> > + * workqueue.
> > + */
> > + error = iomap_init_fsverity(mp->m_super, 0, 0);
> > + if (error)
> > + goto out_unmount;
> > +#endif
> > +
> > root = igrab(VFS_I(mp->m_rootip));
> > if (!root) {
> > error = -ENOENT;
> >
> > --
> > 2.50.0
> >
> >
>
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path
2025-07-31 11:34 ` Andrey Albershteyn
@ 2025-07-31 14:52 ` Darrick J. Wong
2025-07-31 15:01 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-31 14:52 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Thu, Jul 31, 2025 at 01:34:24PM +0200, Andrey Albershteyn wrote:
> On 2025-07-29 16:21:52, Darrick J. Wong wrote:
> > On Mon, Jul 28, 2025 at 10:30:17PM +0200, Andrey Albershteyn wrote:
> > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > >
> > > This patch adds fs-verity verification into iomap's read path. After
> > > BIO's io operation is complete the data are verified against
> > > fs-verity's Merkle tree. Verification work is done in a separate
> > > workqueue.
> > >
> > > The read path ioend iomap_read_ioend are stored side by side with
> > > BIOs if FS_VERITY is enabled.
> > >
> > > [djwong: fix doc warning]
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/iomap/buffered-io.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++--
> > > fs/iomap/ioend.c | 41 +++++++++++++-
> > > include/linux/iomap.h | 13 +++++
> > > 3 files changed, 198 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index e959a206cba9..87c974e543e0 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -6,6 +6,7 @@
> > > #include <linux/module.h>
> > > #include <linux/compiler.h>
> > > #include <linux/fs.h>
> > > +#include <linux/fsverity.h>
> > > #include <linux/iomap.h>
> > > #include <linux/pagemap.h>
> > > #include <linux/uio.h>
> > > @@ -363,6 +364,116 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
> > > pos >= i_size_read(iter->inode);
> > > }
> > >
> > > +#ifdef CONFIG_FS_VERITY
> > > +int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
> > > + int max_active)
> > > +{
> > > + int ret;
> > > +
> > > + if (!iomap_fsverity_bioset) {
> > > + ret = iomap_fsverity_init_bioset();
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + return fsverity_init_wq(sb, wq_flags, max_active);
> > > +}
> > > +EXPORT_SYMBOL_GPL(iomap_init_fsverity);
> > > +
> > > +static void
> > > +iomap_read_fsverify_end_io_work(struct work_struct *work)
> > > +{
> > > + struct iomap_fsverity_bio *fbio =
> > > + container_of(work, struct iomap_fsverity_bio, work);
> > > +
> > > + fsverity_verify_bio(&fbio->bio);
> > > + iomap_read_end_io(&fbio->bio);
> > > +}
> > > +
> > > +static void
> > > +iomap_read_fsverity_end_io(struct bio *bio)
> > > +{
> > > + struct iomap_fsverity_bio *fbio =
> > > + container_of(bio, struct iomap_fsverity_bio, bio);
> > > +
> > > + INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
> > > + queue_work(bio->bi_private, &fbio->work);
> > > +}
> > > +
> > > +static struct bio *
> > > +iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
> > > + int nr_vecs, gfp_t gfp)
> > > +{
> > > + struct bio *bio;
> > > +
> > > + bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> > > + iomap_fsverity_bioset);
> > > + if (bio) {
> > > + bio->bi_private = inode->i_sb->s_verity_wq;
> > > + bio->bi_end_io = iomap_read_fsverity_end_io;
> > > + }
> > > + return bio;
> > > +}
> > > +
> > > +/*
> > > + * True if tree is not aligned with fs block/folio size and we need zero tail
> > > + * part of the folio
> > > + */
> > > +static bool
> > > +iomap_fsverity_tree_end_align(struct iomap_iter *iter, struct folio *folio,
> > > + loff_t pos, size_t plen)
> > > +{
> > > + int error;
> > > + u8 log_blocksize;
> > > + u64 tree_size, tree_mask, last_block_tree, last_block_pos;
> > > +
> > > + /* Not a Merkle tree */
> > > + if (!(iter->iomap.flags & IOMAP_F_BEYOND_EOF))
> > > + return false;
> > > +
> > > + if (plen == folio_size(folio))
> > > + return false;
> > > +
> > > + if (iter->inode->i_blkbits == folio_shift(folio))
> > > + return false;
> > > +
> > > + error = fsverity_merkle_tree_geometry(iter->inode, &log_blocksize, NULL,
> > > + &tree_size);
> > > + if (error)
> > > + return false;
> > > +
> > > + /*
> > > + * We are beyond EOF reading Merkle tree. Therefore, it has highest
> > > + * offset. Mask pos with a tree size to get a position whare are we in
> > > + * the tree. Then, compare index of a last tree block and the index of
> > > + * current pos block.
> > > + */
> > > + last_block_tree = (tree_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > + tree_mask = (1 << fls64(tree_size)) - 1;
> > > + last_block_pos = ((pos & tree_mask) >> PAGE_SHIFT) + 1;
> > > +
> > > + return last_block_tree == last_block_pos;
> > > +}
> > > +#else
> > > +# define iomap_fsverity_read_bio_alloc(...) (NULL)
> > > +# define iomap_fsverity_tree_end_align(...) (false)
> > > +#endif /* CONFIG_FS_VERITY */
> > > +
> > > +static struct bio *iomap_read_bio_alloc(struct inode *inode,
> > > + const struct iomap *iomap, int nr_vecs, gfp_t gfp)
> > > +{
> > > + struct bio *bio;
> > > + struct block_device *bdev = iomap->bdev;
> > > +
> > > + if (fsverity_active(inode) && !(iomap->flags & IOMAP_F_BEYOND_EOF))
> > > + return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
> > > +
> > > + bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
> > > + if (bio)
> > > + bio->bi_end_io = iomap_read_end_io;
> > > + return bio;
> > > +}
> > > +
> > > static int iomap_readpage_iter(struct iomap_iter *iter,
> > > struct iomap_readpage_ctx *ctx)
> > > {
> > > @@ -375,6 +486,10 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > sector_t sector;
> > > int ret;
> > >
> > > + /* Fail reads from broken fsverity files immediately. */
> > > + if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
> > > + return -EIO;
> > > +
> > > if (iomap->type == IOMAP_INLINE) {
> > > ret = iomap_read_inline_data(iter, folio);
> > > if (ret)
> > > @@ -391,6 +506,11 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > if (iomap_block_needs_zeroing(iter, pos) &&
> > > !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> > > folio_zero_range(folio, poff, plen);
> > > + if (fsverity_active(iter->inode) &&
> > > + !fsverity_verify_blocks(folio, plen, poff)) {
> > > + return -EIO;
> > > + }
> > > +
> > > iomap_set_range_uptodate(folio, poff, plen);
> > > goto done;
> > > }
> > > @@ -408,32 +528,51 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > !bio_add_folio(ctx->bio, folio, plen, poff)) {
> > > gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> > > gfp_t orig_gfp = gfp;
> > > - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> > >
> > > if (ctx->bio)
> > > submit_bio(ctx->bio);
> > >
> > > if (ctx->rac) /* same as readahead_gfp_mask */
> > > gfp |= __GFP_NORETRY | __GFP_NOWARN;
> > > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > > - REQ_OP_READ, gfp);
> > > +
> > > + ctx->bio = iomap_read_bio_alloc(iter->inode, iomap,
> > > + bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
> > > + gfp);
> > > +
> > > /*
> > > * If the bio_alloc fails, try it again for a single page to
> > > * avoid having to deal with partial page reads. This emulates
> > > * what do_mpage_read_folio does.
> > > */
> > > if (!ctx->bio) {
> > > - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> > > - orig_gfp);
> > > + ctx->bio = iomap_read_bio_alloc(iter->inode,
> > > + iomap, 1, orig_gfp);
> > > }
> > > if (ctx->rac)
> > > ctx->bio->bi_opf |= REQ_RAHEAD;
> > > ctx->bio->bi_iter.bi_sector = sector;
> > > - ctx->bio->bi_end_io = iomap_read_end_io;
> > > bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> > > }
> > >
> > > done:
> > > + /*
> > > + * For post EOF region, zero part of the folio which won't be read. This
> > > + * happens at the end of the region. So far, the only user is
> > > + * fs-verity which stores continuous data region.
> >
> > Is it ever the case that the zeroed region actually has merkle tree
> > content on disk? Or if this region truly was never written by the
> > fsverity construction code, then why would it access the unwritten
> > region later?
> >
> > Or am I misunderstanding something here?
> >
> > (Probably...)
>
> The zeroed region is never written. With 1k fs block and 1k merkle
> tree block and 4k page size we could end up reading only single
> block, at the end of the tree. But we have to pass PAGE to the
> fsverity. So, only the 1/4 of the page is read. This if-case zeroes
> the rest of the page to make it uptodate. In the normal read path
> this is bound by EOF, but here I use tree size. So, we don't read
> this unwritten region but zero the folio.
>
> The fsverity does zeroing of unused space while construction, but
> this works only for full fs blocks, therefore, 4k fs block and 1k
> merkle tree block.
But if the regions you have to zero are outside the merkle tree, then
fsverity shouldn't ever see those bytes, so why zero them? Or does it
actually check the uptodate bit? So then you want the folio to have
well defined contents?
(At this point I'm picking at nits :P)
--D
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck
2025-07-29 15:28 ` Darrick J. Wong
@ 2025-07-31 14:54 ` Andrey Albershteyn
2025-07-31 16:03 ` Carlos Maiolino
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 14:54 UTC (permalink / raw)
To: Darrick J. Wong, cem
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-29 08:28:39, Darrick J. Wong wrote:
> On Mon, Jul 28, 2025 at 10:30:30PM +0200, Andrey Albershteyn wrote:
> > The quotacheck doesn't initialize sc->ip.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
>
> Looks good,
> Cc: <stable@vger.kernel.org> # v6.8
> Fixes: 21d7500929c8a0 ("xfs: improve dquot iteration for scrub")
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
@cem, could pick this one? Or if you want I can resend it separately
with tags
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path
2025-07-31 14:52 ` Darrick J. Wong
@ 2025-07-31 15:01 ` Andrey Albershteyn
2025-07-31 15:08 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 15:01 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-31 07:52:33, Darrick J. Wong wrote:
> On Thu, Jul 31, 2025 at 01:34:24PM +0200, Andrey Albershteyn wrote:
> > On 2025-07-29 16:21:52, Darrick J. Wong wrote:
> > > On Mon, Jul 28, 2025 at 10:30:17PM +0200, Andrey Albershteyn wrote:
> > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > >
> > > > This patch adds fs-verity verification into iomap's read path. After
> > > > BIO's io operation is complete the data are verified against
> > > > fs-verity's Merkle tree. Verification work is done in a separate
> > > > workqueue.
> > > >
> > > > The read path ioend iomap_read_ioend are stored side by side with
> > > > BIOs if FS_VERITY is enabled.
> > > >
> > > > [djwong: fix doc warning]
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > > ---
> > > > fs/iomap/buffered-io.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++--
> > > > fs/iomap/ioend.c | 41 +++++++++++++-
> > > > include/linux/iomap.h | 13 +++++
> > > > 3 files changed, 198 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index e959a206cba9..87c974e543e0 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -6,6 +6,7 @@
> > > > #include <linux/module.h>
> > > > #include <linux/compiler.h>
> > > > #include <linux/fs.h>
> > > > +#include <linux/fsverity.h>
> > > > #include <linux/iomap.h>
> > > > #include <linux/pagemap.h>
> > > > #include <linux/uio.h>
> > > > @@ -363,6 +364,116 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
> > > > pos >= i_size_read(iter->inode);
> > > > }
> > > >
> > > > +#ifdef CONFIG_FS_VERITY
> > > > +int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
> > > > + int max_active)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + if (!iomap_fsverity_bioset) {
> > > > + ret = iomap_fsverity_init_bioset();
> > > > + if (ret)
> > > > + return ret;
> > > > + }
> > > > +
> > > > + return fsverity_init_wq(sb, wq_flags, max_active);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(iomap_init_fsverity);
> > > > +
> > > > +static void
> > > > +iomap_read_fsverify_end_io_work(struct work_struct *work)
> > > > +{
> > > > + struct iomap_fsverity_bio *fbio =
> > > > + container_of(work, struct iomap_fsverity_bio, work);
> > > > +
> > > > + fsverity_verify_bio(&fbio->bio);
> > > > + iomap_read_end_io(&fbio->bio);
> > > > +}
> > > > +
> > > > +static void
> > > > +iomap_read_fsverity_end_io(struct bio *bio)
> > > > +{
> > > > + struct iomap_fsverity_bio *fbio =
> > > > + container_of(bio, struct iomap_fsverity_bio, bio);
> > > > +
> > > > + INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
> > > > + queue_work(bio->bi_private, &fbio->work);
> > > > +}
> > > > +
> > > > +static struct bio *
> > > > +iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
> > > > + int nr_vecs, gfp_t gfp)
> > > > +{
> > > > + struct bio *bio;
> > > > +
> > > > + bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> > > > + iomap_fsverity_bioset);
> > > > + if (bio) {
> > > > + bio->bi_private = inode->i_sb->s_verity_wq;
> > > > + bio->bi_end_io = iomap_read_fsverity_end_io;
> > > > + }
> > > > + return bio;
> > > > +}
> > > > +
> > > > +/*
> > > > + * True if tree is not aligned with fs block/folio size and we need zero tail
> > > > + * part of the folio
> > > > + */
> > > > +static bool
> > > > +iomap_fsverity_tree_end_align(struct iomap_iter *iter, struct folio *folio,
> > > > + loff_t pos, size_t plen)
> > > > +{
> > > > + int error;
> > > > + u8 log_blocksize;
> > > > + u64 tree_size, tree_mask, last_block_tree, last_block_pos;
> > > > +
> > > > + /* Not a Merkle tree */
> > > > + if (!(iter->iomap.flags & IOMAP_F_BEYOND_EOF))
> > > > + return false;
> > > > +
> > > > + if (plen == folio_size(folio))
> > > > + return false;
> > > > +
> > > > + if (iter->inode->i_blkbits == folio_shift(folio))
> > > > + return false;
> > > > +
> > > > + error = fsverity_merkle_tree_geometry(iter->inode, &log_blocksize, NULL,
> > > > + &tree_size);
> > > > + if (error)
> > > > + return false;
> > > > +
> > > > + /*
> > > > + * We are beyond EOF reading Merkle tree. Therefore, it has highest
> > > > + * offset. Mask pos with a tree size to get a position whare are we in
> > > > + * the tree. Then, compare index of a last tree block and the index of
> > > > + * current pos block.
> > > > + */
> > > > + last_block_tree = (tree_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > > + tree_mask = (1 << fls64(tree_size)) - 1;
> > > > + last_block_pos = ((pos & tree_mask) >> PAGE_SHIFT) + 1;
> > > > +
> > > > + return last_block_tree == last_block_pos;
> > > > +}
> > > > +#else
> > > > +# define iomap_fsverity_read_bio_alloc(...) (NULL)
> > > > +# define iomap_fsverity_tree_end_align(...) (false)
> > > > +#endif /* CONFIG_FS_VERITY */
> > > > +
> > > > +static struct bio *iomap_read_bio_alloc(struct inode *inode,
> > > > + const struct iomap *iomap, int nr_vecs, gfp_t gfp)
> > > > +{
> > > > + struct bio *bio;
> > > > + struct block_device *bdev = iomap->bdev;
> > > > +
> > > > + if (fsverity_active(inode) && !(iomap->flags & IOMAP_F_BEYOND_EOF))
> > > > + return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
> > > > +
> > > > + bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
> > > > + if (bio)
> > > > + bio->bi_end_io = iomap_read_end_io;
> > > > + return bio;
> > > > +}
> > > > +
> > > > static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > struct iomap_readpage_ctx *ctx)
> > > > {
> > > > @@ -375,6 +486,10 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > sector_t sector;
> > > > int ret;
> > > >
> > > > + /* Fail reads from broken fsverity files immediately. */
> > > > + if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
> > > > + return -EIO;
> > > > +
> > > > if (iomap->type == IOMAP_INLINE) {
> > > > ret = iomap_read_inline_data(iter, folio);
> > > > if (ret)
> > > > @@ -391,6 +506,11 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > if (iomap_block_needs_zeroing(iter, pos) &&
> > > > !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> > > > folio_zero_range(folio, poff, plen);
> > > > + if (fsverity_active(iter->inode) &&
> > > > + !fsverity_verify_blocks(folio, plen, poff)) {
> > > > + return -EIO;
> > > > + }
> > > > +
> > > > iomap_set_range_uptodate(folio, poff, plen);
> > > > goto done;
> > > > }
> > > > @@ -408,32 +528,51 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > !bio_add_folio(ctx->bio, folio, plen, poff)) {
> > > > gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> > > > gfp_t orig_gfp = gfp;
> > > > - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> > > >
> > > > if (ctx->bio)
> > > > submit_bio(ctx->bio);
> > > >
> > > > if (ctx->rac) /* same as readahead_gfp_mask */
> > > > gfp |= __GFP_NORETRY | __GFP_NOWARN;
> > > > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > > > - REQ_OP_READ, gfp);
> > > > +
> > > > + ctx->bio = iomap_read_bio_alloc(iter->inode, iomap,
> > > > + bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
> > > > + gfp);
> > > > +
> > > > /*
> > > > * If the bio_alloc fails, try it again for a single page to
> > > > * avoid having to deal with partial page reads. This emulates
> > > > * what do_mpage_read_folio does.
> > > > */
> > > > if (!ctx->bio) {
> > > > - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> > > > - orig_gfp);
> > > > + ctx->bio = iomap_read_bio_alloc(iter->inode,
> > > > + iomap, 1, orig_gfp);
> > > > }
> > > > if (ctx->rac)
> > > > ctx->bio->bi_opf |= REQ_RAHEAD;
> > > > ctx->bio->bi_iter.bi_sector = sector;
> > > > - ctx->bio->bi_end_io = iomap_read_end_io;
> > > > bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> > > > }
> > > >
> > > > done:
> > > > + /*
> > > > + * For post EOF region, zero part of the folio which won't be read. This
> > > > + * happens at the end of the region. So far, the only user is
> > > > + * fs-verity which stores continuous data region.
> > >
> > > Is it ever the case that the zeroed region actually has merkle tree
> > > content on disk? Or if this region truly was never written by the
> > > fsverity construction code, then why would it access the unwritten
> > > region later?
> > >
> > > Or am I misunderstanding something here?
> > >
> > > (Probably...)
> >
> > The zeroed region is never written. With 1k fs block and 1k merkle
> > tree block and 4k page size we could end up reading only single
> > block, at the end of the tree. But we have to pass PAGE to the
> > fsverity. So, only the 1/4 of the page is read. This if-case zeroes
> > the rest of the page to make it uptodate. In the normal read path
> > this is bound by EOF, but here I use tree size. So, we don't read
> > this unwritten region but zero the folio.
> >
> > The fsverity does zeroing of unused space while construction, but
> > this works only for full fs blocks, therefore, 4k fs block and 1k
> > merkle tree block.
>
> But if the regions you have to zero are outside the merkle tree, then
> fsverity shouldn't ever see those bytes, so why zero them? Or does it
> actually check the uptodate bit? So then you want the folio to have
> well defined contents?
verity doesn't check uptodate, but I do in xfs_fsverity_read_merkle().
I think we need to check as it tells that iomap is done reading,
but that's true that zeroing is not necessary.
>
> (At this point I'm picking at nits :P)
>
> --D
>
> > --
> > - Andrey
> >
> >
>
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF
2025-07-29 22:07 ` Darrick J. Wong
@ 2025-07-31 15:04 ` Andrey Albershteyn
0 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 15:04 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-29 15:07:43, Darrick J. Wong wrote:
> On Mon, Jul 28, 2025 at 10:30:05PM +0200, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > Add iomap_writepages_unbound() without limit in form of EOF. XFS
> > will use this to write metadata (fs-verity Merkle tree) in range far
> > beyond EOF.
>
> ...and I guess some day fscrypt might use it to encrypt merkle trees
> too?
Probably, then, the region offsets will need to be adjusted also.
>
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/iomap/buffered-io.c | 51 +++++++++++++++++++++++++++++++++++++++-----------
> > include/linux/iomap.h | 3 +++
> > 2 files changed, 43 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 3729391a18f3..7bef232254a3 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1881,18 +1881,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > int error = 0;
> > u32 rlen;
> >
> > - WARN_ON_ONCE(!folio_test_locked(folio));
> > - WARN_ON_ONCE(folio_test_dirty(folio));
> > - WARN_ON_ONCE(folio_test_writeback(folio));
> > -
> > - trace_iomap_writepage(inode, pos, folio_size(folio));
> > -
> > - if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> > - folio_unlock(folio);
> > - return 0;
> > - }
> > WARN_ON_ONCE(end_pos <= pos);
> >
> > + trace_iomap_writepage(inode, pos, folio_size(folio));
> > +
> > if (i_blocks_per_folio(inode, folio) > 1) {
> > if (!ifs) {
> > ifs = ifs_alloc(inode, folio, 0);
> > @@ -1956,6 +1948,23 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > return error;
> > }
> >
> > +/* Map pages bound by EOF */
> > +static int iomap_writepage_map_eof(struct iomap_writepage_ctx *wpc,
>
> iomap_writepage_map_within_eof ?
sure
>
> > + struct writeback_control *wbc, struct folio *folio)
> > +{
> > + int error;
> > + struct inode *inode = folio->mapping->host;
> > + u64 end_pos = folio_pos(folio) + folio_size(folio);
> > +
> > + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> > + folio_unlock(folio);
> > + return 0;
> > + }
> > +
> > + error = iomap_writepage_map(wpc, wbc, folio);
> > + return error;
> > +}
> > +
> > int
> > iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> > struct iomap_writepage_ctx *wpc,
> > @@ -1972,9 +1981,29 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> > PF_MEMALLOC))
> > return -EIO;
> >
> > + wpc->ops = ops;
> > + while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> > + WARN_ON_ONCE(!folio_test_locked(folio));
> > + WARN_ON_ONCE(folio_test_dirty(folio));
> > + WARN_ON_ONCE(folio_test_writeback(folio));
> > +
> > + error = iomap_writepage_map_eof(wpc, wbc, folio);
> > + }
> > + return iomap_submit_ioend(wpc, error);
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_writepages);
> > +
> > +int
> > +iomap_writepages_unbound(struct address_space *mapping, struct writeback_control *wbc,
>
> Might want to leave a comment here explaining what's the difference
> between the two iomap_writepages exports:
>
> /*
> * Write dirty pages, including any beyond EOF. This is solely for use
> * by files that allow post-EOF pagecache, which means fsverity.
> */
sure, thanks!
>
> > + struct iomap_writepage_ctx *wpc,
> > + const struct iomap_writeback_ops *ops)
> > +{
> > + struct folio *folio = NULL;
> > + int error;
> > +
>
> ...and you might want a:
>
> WARN_ON(!IS_VERITY(wpc->inode));
>
> to keep it that way.
yup, make sense, thanks!
>
> --D
>
> > wpc->ops = ops;
> > while ((folio = writeback_iter(mapping, wbc, folio, &error)))
> > error = iomap_writepage_map(wpc, wbc, folio);
> > return iomap_submit_ioend(wpc, error);
> > }
> > -EXPORT_SYMBOL_GPL(iomap_writepages);
> > +EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 522644d62f30..4a0b5ebb79e9 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -464,6 +464,9 @@ void iomap_sort_ioends(struct list_head *ioend_list);
> > int iomap_writepages(struct address_space *mapping,
> > struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> > const struct iomap_writeback_ops *ops);
> > +int iomap_writepages_unbound(struct address_space *mapping,
> > + struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> > + const struct iomap_writeback_ops *ops);
> >
> > /*
> > * Flags for direct I/O ->end_io:
> >
> > --
> > 2.50.0
> >
> >
>
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 22/29] xfs: add fs-verity support
2025-07-31 14:50 ` Andrey Albershteyn
@ 2025-07-31 15:07 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-31 15:07 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Thu, Jul 31, 2025 at 04:50:02PM +0200, Andrey Albershteyn wrote:
> On 2025-07-29 16:05:36, Darrick J. Wong wrote:
> > On Mon, Jul 28, 2025 at 10:30:26PM +0200, Andrey Albershteyn wrote:
> > > Add integration with fs-verity. XFS stores fs-verity descriptor in
> > > the extended file attributes and the Merkle tree is store in data fork
> > > beyond EOF.
> > >
> > > The descriptor is stored under "vdesc" extended attribute.
> > >
> > > The Merkle tree reading/writing is done through iomap interface. The
> > > data itself are at offset 1 << 53 in the inode's page cache. When XFS
> > > reads from this region iomap doesn't call into fsverity to verify it
> > > against Merkle tree. For data, verification is done on BIO completion.
> > >
> > > When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> > > flag is set meaning that the Merkle tree is being build. The
> > > initialization ends with storing of verity descriptor and setting
> > > inode on-disk flag (XFS_DIFLAG2_VERITY).
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> >
> > /me wonders if this is really SoB me since it's rather different from
> > the last time I even touched this patch. It also means I can't RVB it.
> > ;)
>
> Yeah, it probably make sense to drop it, because I dropped most of
> the blob cache related code
>
> >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/xfs/Makefile | 1 +
> > > fs/xfs/libxfs/xfs_da_format.h | 4 +
> > > fs/xfs/xfs_bmap_util.c | 7 +
> > > fs/xfs/xfs_fsverity.c | 311 ++++++++++++++++++++++++++++++++++++++++++
> > > fs/xfs/xfs_fsverity.h | 28 ++++
> > > fs/xfs/xfs_inode.h | 6 +
> > > fs/xfs/xfs_super.c | 20 +++
> > > 7 files changed, 377 insertions(+)
> > >
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 5bf501cf8271..ad66439db7bf 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -147,6 +147,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
> > > xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
> > > xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
> > > xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
> > > +xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o
> > >
> > > # notify failure
> > > ifeq ($(CONFIG_MEMORY_FAILURE),y)
> > > diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> > > index e5274be2fe9c..b17fdbbb48aa 100644
> > > --- a/fs/xfs/libxfs/xfs_da_format.h
> > > +++ b/fs/xfs/libxfs/xfs_da_format.h
> > > @@ -909,4 +909,8 @@ struct xfs_parent_rec {
> > > __be32 p_gen;
> > > } __packed;
> > >
> > > +/* fs-verity ondisk xattr name used for the fsverity descriptor */
> > > +#define XFS_VERITY_DESCRIPTOR_NAME "vdesc"
> > > +#define XFS_VERITY_DESCRIPTOR_NAME_LEN (sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
> > > +
> > > #endif /* __XFS_DA_FORMAT_H__ */
> > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > index 06ca11731e43..af1933129647 100644
> > > --- a/fs/xfs/xfs_bmap_util.c
> > > +++ b/fs/xfs/xfs_bmap_util.c
> > > @@ -31,6 +31,7 @@
> > > #include "xfs_rtbitmap.h"
> > > #include "xfs_rtgroup.h"
> > > #include "xfs_zone_alloc.h"
> > > +#include <linux/fsverity.h>
> > >
> > > /* Kernel only BMAP related definitions and functions */
> > >
> > > @@ -553,6 +554,12 @@ xfs_can_free_eofblocks(
> > > if (last_fsb <= end_fsb)
> > > return false;
> > >
> > > + /*
> > > + * Nothing to clean on fsverity inodes as they are read-only
> > > + */
> > > + if (IS_VERITY(VFS_I(ip)))
> > > + return false;
> > > +
> > > /*
> > > * Check if there is an post-EOF extent to free. If there are any
> > > * delalloc blocks attached to the inode (data fork delalloc
> > > diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
> > > new file mode 100644
> > > index 000000000000..ec7dea4289d5
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_fsverity.c
> > > @@ -0,0 +1,311 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * Copyright (C) 2023 Red Hat, Inc.
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_attr.h"
> > > +#include "xfs_bmap_util.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_attr_leaf.h"
> > > +#include "xfs_trace.h"
> > > +#include "xfs_quota.h"
> > > +#include "xfs_ag.h"
> > > +#include "xfs_fsverity.h"
> > > +#include "xfs_iomap.h"
> > > +#include <linux/fsverity.h>
> > > +#include <linux/pagemap.h>
> > > +
> > > +/*
> > > + * Initialize an args structure to load or store the fsverity descriptor.
> > > + * Caller must ensure @args is zeroed except for value and valuelen.
> > > + */
> > > +static inline void
> > > +xfs_fsverity_init_vdesc_args(
> > > + struct xfs_inode *ip,
> > > + struct xfs_da_args *args)
> > > +{
> > > + args->geo = ip->i_mount->m_attr_geo;
> > > + args->whichfork = XFS_ATTR_FORK;
> > > + args->attr_filter = XFS_ATTR_VERITY;
> > > + args->op_flags = XFS_DA_OP_OKNOENT;
> > > + args->dp = ip;
> > > + args->owner = ip->i_ino;
> > > + args->name = XFS_VERITY_DESCRIPTOR_NAME;
> > > + args->namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN;
> > > + xfs_attr_sethash(args);
> > > +}
> > > +
> > > +/* Delete the verity descriptor. */
> > > +static int
> > > +xfs_fsverity_delete_descriptor(
> > > + struct xfs_inode *ip)
> > > +{
> > > + struct xfs_da_args args = { };
> > > +
> > > + xfs_fsverity_init_vdesc_args(ip, &args);
> > > + return xfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
> > > +}
> > > +
> > > +/* Retrieve the verity descriptor. */
> > > +static int
> > > +xfs_fsverity_get_descriptor(
> > > + struct inode *inode,
> > > + void *buf,
> > > + size_t buf_size)
> > > +{
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + struct xfs_da_args args = {
> > > + .value = buf,
> > > + .valuelen = buf_size,
> > > + };
> > > + int error = 0;
> > > +
> > > + /*
> > > + * The fact that (returned attribute size) == (provided buf_size) is
> > > + * checked by xfs_attr_copy_value() (returns -ERANGE). No descriptor
> > > + * is treated as a short read so that common fsverity code will
> > > + * complain.
> > > + */
> > > + xfs_fsverity_init_vdesc_args(ip, &args);
> > > + error = xfs_attr_get(&args);
> > > + if (error == -ENOATTR)
> > > + return 0;
> > > + if (error)
> > > + return error;
> > > +
> > > + return args.valuelen;
> > > +}
> > > +
> > > +/* Try to remove all the fsverity metadata after a failed enablement. */
> > > +static int
> > > +xfs_fsverity_delete_metadata(
> > > + struct xfs_inode *ip)
> > > +{
> > > + struct xfs_trans *tp;
> > > + struct xfs_mount *mp = ip->i_mount;
> > > + int error;
> > > +
> > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> > > + if (error) {
> > > + ASSERT(xfs_is_shutdown(mp));
> > > + return error;
> > > + }
> > > +
> > > + xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > + xfs_trans_ijoin(tp, ip, 0);
> > > +
> > > + /*
> > > + * As only merkle tree is getting removed, no need to change inode size
> > > + */
> > > + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> > > + if (error)
> > > + goto err_cancel;
> > > +
> > > + error = xfs_trans_commit(tp);
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > + if (error)
> > > + return error;
> > > +
> > > + error = xfs_fsverity_delete_descriptor(ip);
> > > + return error != -ENOATTR ? error : 0;
> >
> > Do you need to re-truncate the file to EOF to clear out the post-eof
> > blocks?
>
> yes, any prealloc extents, or failed merkle tree construction
> extents
>
> >
> > > +
> > > +err_cancel:
> > > + xfs_trans_cancel(tp);
> > > + return error;
> > > +}
> > > +
> > > +
> > > +/* Prepare to enable fsverity by clearing old metadata. */
> > > +static int
> > > +xfs_fsverity_begin_enable(
> > > + struct file *filp)
> > > +{
> > > + struct inode *inode = file_inode(filp);
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + int error;
> > > +
> > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
Hrm, do you need to take MMAPLOCK_EXCL here to lock out write faults?
The answer might be false if page_mkwrite looks for VERITY_CONSTRUCTION
and errors out.
> > > + if (IS_DAX(inode))
> > > + return -EINVAL;
> > > +
> > > + if (inode->i_size > XFS_FSVERITY_MTREE_OFFSET)
> > > + return -EFBIG;
> > > +
> > > + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> > > + return -EBUSY;
> >
> > Hm. Do we actually flush the pagecache to disk in the _begin_enable
> > path? /me looked briefly but didn't see anything.
>
> hmm, you're right, this is missing, my memory had an impression that
> fsverity code does it but it doesn't
>
> I'm not sure that this is actually needed here as we will do
> filemap_write_and_wait() in _end_enable() while inode is being
> locked and only writting happening is merkle tree construction.
I guess this depends on what Eric thinks is the sequence for enabling
verity. I'd have thought that you'd want to stabilize the file on disk
before constructing the merkle tree, but the ext4 implementation doesn't
do that afaict.
> > > +
> > > + error = xfs_qm_dqattach(ip);
> > > + if (error)
> > > + return error;
> > > +
> > > + return xfs_fsverity_delete_metadata(ip);
> > > +}
> > > +
> > > +/* Complete (or fail) the process of enabling fsverity. */
> > > +static int
> > > +xfs_fsverity_end_enable(
> > > + struct file *filp,
> > > + const void *desc,
> > > + size_t desc_size,
> > > + u64 merkle_tree_size)
> > > +{
> > > + struct xfs_da_args args = {
> > > + .value = (void *)desc,
> > > + .valuelen = desc_size,
> > > + };
> > > + struct inode *inode = file_inode(filp);
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + struct xfs_mount *mp = ip->i_mount;
> > > + struct xfs_trans *tp;
> > > + int error = 0;
> > > +
> > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > > +
> > > + /* fs-verity failed, just cleanup */
> > > + if (desc == NULL)
> > > + goto out;
> > > +
> > > + xfs_fsverity_init_vdesc_args(ip, &args);
> > > + error = xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Wait for Merkel tree get written to disk before setting on-disk inode
> > > + * flag and clering XFS_VERITY_CONSTRUCTION
> >
> > s/Merkel/Merkle/
> >
> > s/clering/clearing/ , sorry if that's my typo
>
> thanks!
>
> >
> > > + */
> > > + error = filemap_write_and_wait(inode->i_mapping);
> > > + if (error)
> > > + return error;
> > > +
> > > + /* Set fsverity inode flag */
> > > + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> > > + 0, 0, false, &tp);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Ensure that we've persisted the verity information before we enable
> > > + * it on the inode and tell the caller we have sealed the inode.
> > > + */
> > > + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> > > +
> > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > > + xfs_trans_set_sync(tp);
> > > +
> > > + error = xfs_trans_commit(tp);
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +
> > > + if (!error)
> > > + inode->i_flags |= S_VERITY;
> > > +
> > > +out:
> > > + if (error) {
> > > + int error2;
> > > +
> > > + error2 = xfs_fsverity_delete_metadata(ip);
> > > + if (error2)
> > > + xfs_alert(ip->i_mount,
> > > +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> > > + ip->i_ino, error2);
> > > + }
> > > +
> > > + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> > > + return error;
> > > +}
> > > +
> > > +static void
> > > +xfs_fsverity_adjust_read(
> > > + struct ioregion *region)
> > > +{
> > > + u8 log_blocksize;
> > > + unsigned int block_size;
> > > + u64 tree_size;
> > > + u64 position = region->pos & XFS_FSVERITY_MTREE_MASK;
> > > +
> > > + fsverity_merkle_tree_geometry(region->inode, &log_blocksize,
> > > + &block_size, &tree_size);
> > > +
> > > + if (position + region->length < tree_size)
> > > + return;
> > > +
> > > + region->length = tree_size - position;
> > > +}
> > > +
> > > +/* Retrieve a merkle tree block. */
> > > +static struct page *
> > > +xfs_fsverity_read_merkle(
> > > + struct inode *inode,
> > > + pgoff_t index,
> > > + unsigned long num_ra_pages)
> > > +{
> > > + u64 position = (index << PAGE_SHIFT) | XFS_FSVERITY_MTREE_OFFSET;
> >
> > Is this shift safe on 32-bit where pgoff_t is unsigned long?
> > I don't think it is....?
>
> u64 position = ((loff_t)index << PAGE_SHIFT) | XFS_FSVERITY_MTREE_OFFSET;
> like this?
Yeah. I really wonder if the pagecache needs to grow a typed helper:
static inline loff_t pgoff_to_loff(pgoff_t i)
{
return ((loff_t)i) << PAGE_SHIFT;
}
loff_t position = pgoff_to_loff(index) + XFS_FSVERITY_MTREE_OFFSET;
> >
> > > + struct ioregion region = {
> > > + .inode = inode,
> > > + .pos = position,
> > > + .length = PAGE_SIZE,
> > > + .ops = &xfs_read_iomap_ops,
> > > + };
> > > + int error;
> > > + struct folio *folio;
> > > +
> > > + /*
> > > + * As region->length is PAGE_SIZE we have to adjust the length for the
> > > + * end of the tree. The case when tree blocks size are smaller then
> > > + * PAGE_SIZE.
> >
> > Hrm. I wonder if we want to try to create large folios for merkle
> > trees?
>
> I think we can as we just need to then pick the right page fsverity
> requested
Oh, right, and you can get individual pages from a large folio.
> >
> > Since the comment references PAGE_SIZE, what happens if this is an 8k
> > fsblock XFS filesystem where folios will always be at least 8k?
>
> I think this should be fine, the default is PAGE_SIZE because
> fsverity is interested in pages, so, start from this and adjust if
> we don't have data to fill it.
<nod>
--D
> >
> > > + */
> > > + xfs_fsverity_adjust_read(®ion);
> > > +
> > > + folio = iomap_read_region(®ion);
> > > + if (IS_ERR(folio))
> > > + return ERR_PTR(-EIO);
> > > +
> > > + /* Wait for buffered read to finish */
> > > + error = folio_wait_locked_killable(folio);
> > > + if (error)
> > > + return ERR_PTR(error);
> > > + if (IS_ERR(folio) || !folio_test_uptodate(folio))
> > > + return ERR_PTR(-EFSCORRUPTED);
> > > +
> > > + return folio_file_page(folio, 0);
> > > +}
> > > +
> > > +/* Write a merkle tree block. */
> > > +static int
> > > +xfs_fsverity_write_merkle(
> > > + struct inode *inode,
> > > + const void *buf,
> > > + u64 pos,
> > > + unsigned int size)
> > > +{
> > > + struct ioregion region = {
> > > + .inode = inode,
> > > + .pos = pos | XFS_FSVERITY_MTREE_OFFSET,
> > > + .buf = buf,
> > > + .length = size,
> > > + .ops = &xfs_buffered_write_iomap_ops,
> > > + };
> > > +
> > > + if (region.pos + region.length > inode->i_sb->s_maxbytes)
> > > + return -EFBIG;
> > > +
> > > + return iomap_write_region(®ion);
> > > +}
> > > +
> > > +const struct fsverity_operations xfs_fsverity_ops = {
> > > + .begin_enable_verity = xfs_fsverity_begin_enable,
> > > + .end_enable_verity = xfs_fsverity_end_enable,
> > > + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> > > + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> > > + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> > > +};
> > > diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> > > new file mode 100644
> > > index 000000000000..e063b7288dc0
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_fsverity.h
> > > @@ -0,0 +1,28 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * Copyright (C) 2022 Red Hat, Inc.
> > > + */
> > > +#ifndef __XFS_FSVERITY_H__
> > > +#define __XFS_FSVERITY_H__
> > > +
> > > +/* Merkle tree location in page cache. We take memory region from the inode's
> > > + * address space for Merkle tree.
> > > + *
> > > + * At maximum of 8 levels with 128 hashes per block (32 bytes SHA-256) maximum
> >
> > Aren't sha512 hashes allowed too? In which case it would be 64 bytes
> > per hash?
> >
> > I suppose that doesn't really matter since... an 8EB file on a 1k
> > fsblock filesystem with 1k merkle tree blocks will do this:
> >
> > # xfs_db /dev/sda -c "btheight -b 1024 -n $(( (2**63 - 1) / 1024 ))
> > 64:64:0:merkle -w min"
> > 64:64:0:merkle: worst case per 1024-byte block: 8 records (leaf) / 8
> > keyptrs (node)
> > level 0: 9007199254740991 records, 1125899906842624 blocks
> > level 1: 1125899906842624 records, 140737488355328 blocks
> > level 2: 140737488355328 records, 17592186044416 blocks
> > level 3: 17592186044416 records, 2199023255552 blocks
> > level 4: 2199023255552 records, 274877906944 blocks
> > level 5: 274877906944 records, 34359738368 blocks
> > level 6: 34359738368 records, 4294967296 blocks
> > level 7: 4294967296 records, 536870912 blocks
> > level 8: 536870912 records, 67108864 blocks
> > level 9: 67108864 records, 8388608 blocks
> > level 10: 8388608 records, 1048576 blocks
> > level 11: 1048576 records, 131072 blocks
> > level 12: 131072 records, 16384 blocks
> > level 13: 16384 records, 2048 blocks
> > level 14: 2048 records, 256 blocks
> > level 15: 256 records, 32 blocks
> > level 16: 32 records, 4 blocks
> > level 17: 4 records, 1 block
> > 18 levels, 1286742750677285 blocks total
> >
> > log2(1286742750677285) is 50, so your choice of 53 is fine. Though this
> > will far exceed the max merkle tree height anyway.
>
> yup, the 53 seems to be fine even for 1k.
>
> >
> > > + * tree size is ((128^8 − 1)/(128 − 1)) = 567*10^12 blocks. This should fit in 53
> > > + * bits address space.
> > > + *
> > > + * At this Merkle tree size we can cover 295EB large file. This is much larger
> > > + * than the currently supported file size.
> > > + *
> > > + * As we allocate some pagecache space for fsverity tree we need to limit
> > > + * maximum fsverity file size.
> > > + */
> > > +#define XFS_FSVERITY_MTREE_OFFSET (1ULL << 53)
> > > +#define XFS_FSVERITY_MTREE_MASK (XFS_FSVERITY_MTREE_OFFSET - 1)
> >
> > This is part of the ondisk format, so it belongs in xfs_fs.h.
>
> oh, will move it
>
> >
> > --D
> >
> > > +#ifdef CONFIG_FS_VERITY
> > > +extern const struct fsverity_operations xfs_fsverity_ops;
> > > +#endif /* CONFIG_FS_VERITY */
> > > +
> > > +#endif /* __XFS_FSVERITY_H__ */
> > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > index d7e2b902ef5c..033e1a71e64b 100644
> > > --- a/fs/xfs/xfs_inode.h
> > > +++ b/fs/xfs/xfs_inode.h
> > > @@ -404,6 +404,12 @@ static inline bool xfs_inode_can_hw_atomic_write(const struct xfs_inode *ip)
> > > */
> > > #define XFS_IREMAPPING (1U << 15)
> > >
> > > +/*
> > > + * fs-verity's Merkle tree is under construction. The file is read-only, the
> > > + * only writes happening is the ones with Merkle tree blocks.
> > > + */
> > > +#define XFS_VERITY_CONSTRUCTION (1U << 16)
> > > +
> > > /* All inode state flags related to inode reclaim. */
> > > #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> > > XFS_IRECLAIM | \
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 7b1ace75955c..10fb23ac672a 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -30,6 +30,7 @@
> > > #include "xfs_filestream.h"
> > > #include "xfs_quota.h"
> > > #include "xfs_sysfs.h"
> > > +#include "xfs_fsverity.h"
> > > #include "xfs_ondisk.h"
> > > #include "xfs_rmap_item.h"
> > > #include "xfs_refcount_item.h"
> > > @@ -54,6 +55,7 @@
> > > #include <linux/fs_context.h>
> > > #include <linux/fs_parser.h>
> > > #include <linux/fsverity.h>
> > > +#include <linux/iomap.h>
> > >
> > > static const struct super_operations xfs_super_operations;
> > >
> > > @@ -1711,6 +1713,9 @@ xfs_fs_fill_super(
> > > sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> > > #endif
> > > sb->s_op = &xfs_super_operations;
> > > +#ifdef CONFIG_FS_VERITY
> > > + sb->s_vop = &xfs_fsverity_ops;
> > > +#endif
> > >
> > > /*
> > > * Delay mount work if the debug hook is set. This is debug
> > > @@ -1964,10 +1969,25 @@ xfs_fs_fill_super(
> > > xfs_set_resuming_quotaon(mp);
> > > mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
> > >
> > > + if (xfs_has_verity(mp))
> > > + xfs_warn(mp,
> > > + "EXPERIMENTAL fsverity feature in use. Use at your own risk!");
> > > +
> > > error = xfs_mountfs(mp);
> > > if (error)
> > > goto out_filestream_unmount;
> > >
> > > +#ifdef CONFIG_FS_VERITY
> > > + /*
> > > + * Don't use a high priority workqueue like the other fsverity
> > > + * implementations because that will lead to conflicts with the xfs log
> > > + * workqueue.
> > > + */
> > > + error = iomap_init_fsverity(mp->m_super, 0, 0);
> > > + if (error)
> > > + goto out_unmount;
> > > +#endif
> > > +
> > > root = igrab(VFS_I(mp->m_rootip));
> > > if (!root) {
> > > error = -ENOENT;
> > >
> > > --
> > > 2.50.0
> > >
> > >
> >
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path
2025-07-31 15:01 ` Andrey Albershteyn
@ 2025-07-31 15:08 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-07-31 15:08 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On Thu, Jul 31, 2025 at 05:01:48PM +0200, Andrey Albershteyn wrote:
> On 2025-07-31 07:52:33, Darrick J. Wong wrote:
> > On Thu, Jul 31, 2025 at 01:34:24PM +0200, Andrey Albershteyn wrote:
> > > On 2025-07-29 16:21:52, Darrick J. Wong wrote:
> > > > On Mon, Jul 28, 2025 at 10:30:17PM +0200, Andrey Albershteyn wrote:
> > > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > > >
> > > > > This patch adds fs-verity verification into iomap's read path. After
> > > > > BIO's io operation is complete the data are verified against
> > > > > fs-verity's Merkle tree. Verification work is done in a separate
> > > > > workqueue.
> > > > >
> > > > > The read path ioend iomap_read_ioend are stored side by side with
> > > > > BIOs if FS_VERITY is enabled.
> > > > >
> > > > > [djwong: fix doc warning]
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > > > ---
> > > > > fs/iomap/buffered-io.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > fs/iomap/ioend.c | 41 +++++++++++++-
> > > > > include/linux/iomap.h | 13 +++++
> > > > > 3 files changed, 198 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > > index e959a206cba9..87c974e543e0 100644
> > > > > --- a/fs/iomap/buffered-io.c
> > > > > +++ b/fs/iomap/buffered-io.c
> > > > > @@ -6,6 +6,7 @@
> > > > > #include <linux/module.h>
> > > > > #include <linux/compiler.h>
> > > > > #include <linux/fs.h>
> > > > > +#include <linux/fsverity.h>
> > > > > #include <linux/iomap.h>
> > > > > #include <linux/pagemap.h>
> > > > > #include <linux/uio.h>
> > > > > @@ -363,6 +364,116 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
> > > > > pos >= i_size_read(iter->inode);
> > > > > }
> > > > >
> > > > > +#ifdef CONFIG_FS_VERITY
> > > > > +int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
> > > > > + int max_active)
> > > > > +{
> > > > > + int ret;
> > > > > +
> > > > > + if (!iomap_fsverity_bioset) {
> > > > > + ret = iomap_fsverity_init_bioset();
> > > > > + if (ret)
> > > > > + return ret;
> > > > > + }
> > > > > +
> > > > > + return fsverity_init_wq(sb, wq_flags, max_active);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(iomap_init_fsverity);
> > > > > +
> > > > > +static void
> > > > > +iomap_read_fsverify_end_io_work(struct work_struct *work)
> > > > > +{
> > > > > + struct iomap_fsverity_bio *fbio =
> > > > > + container_of(work, struct iomap_fsverity_bio, work);
> > > > > +
> > > > > + fsverity_verify_bio(&fbio->bio);
> > > > > + iomap_read_end_io(&fbio->bio);
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +iomap_read_fsverity_end_io(struct bio *bio)
> > > > > +{
> > > > > + struct iomap_fsverity_bio *fbio =
> > > > > + container_of(bio, struct iomap_fsverity_bio, bio);
> > > > > +
> > > > > + INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
> > > > > + queue_work(bio->bi_private, &fbio->work);
> > > > > +}
> > > > > +
> > > > > +static struct bio *
> > > > > +iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
> > > > > + int nr_vecs, gfp_t gfp)
> > > > > +{
> > > > > + struct bio *bio;
> > > > > +
> > > > > + bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> > > > > + iomap_fsverity_bioset);
> > > > > + if (bio) {
> > > > > + bio->bi_private = inode->i_sb->s_verity_wq;
> > > > > + bio->bi_end_io = iomap_read_fsverity_end_io;
> > > > > + }
> > > > > + return bio;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * True if tree is not aligned with fs block/folio size and we need zero tail
> > > > > + * part of the folio
> > > > > + */
> > > > > +static bool
> > > > > +iomap_fsverity_tree_end_align(struct iomap_iter *iter, struct folio *folio,
> > > > > + loff_t pos, size_t plen)
> > > > > +{
> > > > > + int error;
> > > > > + u8 log_blocksize;
> > > > > + u64 tree_size, tree_mask, last_block_tree, last_block_pos;
> > > > > +
> > > > > + /* Not a Merkle tree */
> > > > > + if (!(iter->iomap.flags & IOMAP_F_BEYOND_EOF))
> > > > > + return false;
> > > > > +
> > > > > + if (plen == folio_size(folio))
> > > > > + return false;
> > > > > +
> > > > > + if (iter->inode->i_blkbits == folio_shift(folio))
> > > > > + return false;
> > > > > +
> > > > > + error = fsverity_merkle_tree_geometry(iter->inode, &log_blocksize, NULL,
> > > > > + &tree_size);
> > > > > + if (error)
> > > > > + return false;
> > > > > +
> > > > > + /*
> > > > > + * We are beyond EOF reading Merkle tree. Therefore, it has highest
> > > > > + * offset. Mask pos with a tree size to get a position whare are we in
> > > > > + * the tree. Then, compare index of a last tree block and the index of
> > > > > + * current pos block.
> > > > > + */
> > > > > + last_block_tree = (tree_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > > > + tree_mask = (1 << fls64(tree_size)) - 1;
> > > > > + last_block_pos = ((pos & tree_mask) >> PAGE_SHIFT) + 1;
> > > > > +
> > > > > + return last_block_tree == last_block_pos;
> > > > > +}
> > > > > +#else
> > > > > +# define iomap_fsverity_read_bio_alloc(...) (NULL)
> > > > > +# define iomap_fsverity_tree_end_align(...) (false)
> > > > > +#endif /* CONFIG_FS_VERITY */
> > > > > +
> > > > > +static struct bio *iomap_read_bio_alloc(struct inode *inode,
> > > > > + const struct iomap *iomap, int nr_vecs, gfp_t gfp)
> > > > > +{
> > > > > + struct bio *bio;
> > > > > + struct block_device *bdev = iomap->bdev;
> > > > > +
> > > > > + if (fsverity_active(inode) && !(iomap->flags & IOMAP_F_BEYOND_EOF))
> > > > > + return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
> > > > > +
> > > > > + bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
> > > > > + if (bio)
> > > > > + bio->bi_end_io = iomap_read_end_io;
> > > > > + return bio;
> > > > > +}
> > > > > +
> > > > > static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > > struct iomap_readpage_ctx *ctx)
> > > > > {
> > > > > @@ -375,6 +486,10 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > > sector_t sector;
> > > > > int ret;
> > > > >
> > > > > + /* Fail reads from broken fsverity files immediately. */
> > > > > + if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
> > > > > + return -EIO;
> > > > > +
> > > > > if (iomap->type == IOMAP_INLINE) {
> > > > > ret = iomap_read_inline_data(iter, folio);
> > > > > if (ret)
> > > > > @@ -391,6 +506,11 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > > if (iomap_block_needs_zeroing(iter, pos) &&
> > > > > !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> > > > > folio_zero_range(folio, poff, plen);
> > > > > + if (fsverity_active(iter->inode) &&
> > > > > + !fsverity_verify_blocks(folio, plen, poff)) {
> > > > > + return -EIO;
> > > > > + }
> > > > > +
> > > > > iomap_set_range_uptodate(folio, poff, plen);
> > > > > goto done;
> > > > > }
> > > > > @@ -408,32 +528,51 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > > > > !bio_add_folio(ctx->bio, folio, plen, poff)) {
> > > > > gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> > > > > gfp_t orig_gfp = gfp;
> > > > > - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> > > > >
> > > > > if (ctx->bio)
> > > > > submit_bio(ctx->bio);
> > > > >
> > > > > if (ctx->rac) /* same as readahead_gfp_mask */
> > > > > gfp |= __GFP_NORETRY | __GFP_NOWARN;
> > > > > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > > > > - REQ_OP_READ, gfp);
> > > > > +
> > > > > + ctx->bio = iomap_read_bio_alloc(iter->inode, iomap,
> > > > > + bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
> > > > > + gfp);
> > > > > +
> > > > > /*
> > > > > * If the bio_alloc fails, try it again for a single page to
> > > > > * avoid having to deal with partial page reads. This emulates
> > > > > * what do_mpage_read_folio does.
> > > > > */
> > > > > if (!ctx->bio) {
> > > > > - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> > > > > - orig_gfp);
> > > > > + ctx->bio = iomap_read_bio_alloc(iter->inode,
> > > > > + iomap, 1, orig_gfp);
> > > > > }
> > > > > if (ctx->rac)
> > > > > ctx->bio->bi_opf |= REQ_RAHEAD;
> > > > > ctx->bio->bi_iter.bi_sector = sector;
> > > > > - ctx->bio->bi_end_io = iomap_read_end_io;
> > > > > bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> > > > > }
> > > > >
> > > > > done:
> > > > > + /*
> > > > > + * For post EOF region, zero part of the folio which won't be read. This
> > > > > + * happens at the end of the region. So far, the only user is
> > > > > + * fs-verity which stores continuous data region.
> > > >
> > > > Is it ever the case that the zeroed region actually has merkle tree
> > > > content on disk? Or if this region truly was never written by the
> > > > fsverity construction code, then why would it access the unwritten
> > > > region later?
> > > >
> > > > Or am I misunderstanding something here?
> > > >
> > > > (Probably...)
> > >
> > > The zeroed region is never written. With 1k fs block and 1k merkle
> > > tree block and 4k page size we could end up reading only single
> > > block, at the end of the tree. But we have to pass PAGE to the
> > > fsverity. So, only the 1/4 of the page is read. This if-case zeroes
> > > the rest of the page to make it uptodate. In the normal read path
> > > this is bound by EOF, but here I use tree size. So, we don't read
> > > this unwritten region but zero the folio.
> > >
> > > The fsverity does zeroing of unused space while construction, but
> > > this works only for full fs blocks, therefore, 4k fs block and 1k
> > > merkle tree block.
> >
> > But if the regions you have to zero are outside the merkle tree, then
> > fsverity shouldn't ever see those bytes, so why zero them? Or does it
> > actually check the uptodate bit? So then you want the folio to have
> > well defined contents?
>
> verity doesn't check uptodate, but I do in xfs_fsverity_read_merkle().
> I think we need to check as it tells that iomap is done reading,
> but that's true that zeroing is not necessary.
Ok sounds good to me.
--D
> >
> > (At this point I'm picking at nits :P)
> >
> > --D
> >
> > > --
> > > - Andrey
> > >
> > >
> >
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface
2025-07-29 22:22 ` Darrick J. Wong
@ 2025-07-31 15:51 ` Andrey Albershteyn
2025-08-11 11:43 ` Christoph Hellwig
1 sibling, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-07-31 15:51 UTC (permalink / raw)
To: Darrick J. Wong
Cc: fsverity, linux-fsdevel, linux-xfs, david, ebiggers, hch,
Andrey Albershteyn
On 2025-07-29 15:22:52, Darrick J. Wong wrote:
> On Mon, Jul 28, 2025 at 10:30:06PM +0200, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > Interface for writing data beyond EOF into offsetted region in
> > page cache.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > include/linux/iomap.h | 16 ++++++++
> > fs/iomap/buffered-io.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 114 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 4a0b5ebb79e9..73288f28543f 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -83,6 +83,11 @@ struct vm_fault;
> > */
> > #define IOMAP_F_PRIVATE (1U << 12)
> >
> > +/*
> > + * Writes happens beyound inode EOF
> > + */
> > +#define IOMAP_F_BEYOND_EOF (1U << 13)
> > +
> > /*
> > * Flags set by the core iomap code during operations:
> > *
> > @@ -533,4 +538,15 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
> >
> > extern struct bio_set iomap_ioend_bioset;
> >
> > +struct ioregion {
> > + struct inode *inode;
> > + loff_t pos; /* IO position */
> > + const void *buf; /* Data to be written (in only) */
> > + size_t length; /* Length of the date */
>
> Length of the data ?
thanks!
>
> > + const struct iomap_ops *ops;
> > +};
>
> This sounds like a kiocb and a kvec...
>
> > +
> > +struct folio *iomap_read_region(struct ioregion *region);
> > +int iomap_write_region(struct ioregion *region);
>
> ...and these sound a lot like filemap_read and iomap_write_iter.
> Why not use those? You'd get readahead for free. Though I guess
> filemap_read cuts off at i_size so maybe that's why this is necessary?
>
> (and by extension, is this why the existing fsverity implementations
> seem to do their own readahead and reading?)
>
> ((and now I guess I see why this isn't done through the regular kiocb
> interface, because then we'd be exposing post-EOF data hiding to
> everyone in the system))
>
> > #endif /* LINUX_IOMAP_H */
> >
> > --
> > 2.50.0
> >
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 7bef232254a3..e959a206cba9 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -321,6 +321,7 @@ struct iomap_readpage_ctx {
> > bool cur_folio_in_bio;
> > struct bio *bio;
> > struct readahead_control *rac;
> > + int flags;
>
> What flags go in here?
IOMAP_F_BEYOND_EOF
>
> > };
> >
> > /**
> > @@ -387,7 +388,8 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> > if (plen == 0)
> > goto done;
> >
> > - if (iomap_block_needs_zeroing(iter, pos)) {
> > + if (iomap_block_needs_zeroing(iter, pos) &&
> > + !(iomap->flags & IOMAP_F_BEYOND_EOF)) {
> > folio_zero_range(folio, poff, plen);
> > iomap_set_range_uptodate(folio, poff, plen);
> > goto done;
> > @@ -2007,3 +2009,98 @@ iomap_writepages_unbound(struct address_space *mapping, struct writeback_control
> > return iomap_submit_ioend(wpc, error);
> > }
> > EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
> > +
> > +struct folio *
> > +iomap_read_region(struct ioregion *region)
> > +{
> > + struct inode *inode = region->inode;
> > + fgf_t fgp = FGP_CREAT | FGP_LOCK | fgf_set_order(region->length);
> > + pgoff_t index = (region->pos) >> PAGE_SHIFT;
> > + struct folio *folio = __filemap_get_folio(inode->i_mapping, index, fgp,
> > + mapping_gfp_mask(inode->i_mapping));
> > + int ret;
> > + struct iomap_iter iter = {
> > + .inode = folio->mapping->host,
> > + .pos = region->pos,
> > + .len = region->length,
> > + };
> > + struct iomap_readpage_ctx ctx = {
> > + .cur_folio = folio,
> > + };
> > +
> > + if (folio_test_uptodate(folio)) {
> > + folio_unlock(folio);
> > + return folio;
> > + }
> > +
> > + while ((ret = iomap_iter(&iter, region->ops)) > 0)
> > + iter.status = iomap_read_folio_iter(&iter, &ctx);
>
> Huh, we don't read into region->buf? Oh, I see, this gets iomap to
> install an uptodate folio in the pagecache, and then later we can
> just hand it to fsverity. Maybe?
yes
>
> --D
>
> > +
> > + if (ctx.bio) {
> > + submit_bio(ctx.bio);
> > + WARN_ON_ONCE(!ctx.cur_folio_in_bio);
> > + } else {
> > + WARN_ON_ONCE(ctx.cur_folio_in_bio);
> > + folio_unlock(folio);
> > + }
> > +
> > + return folio;
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_read_region);
> > +
> > +static int iomap_write_region_iter(struct iomap_iter *iter, const void *buf)
> > +{
> > + loff_t pos = iter->pos;
> > + u64 bytes = iomap_length(iter);
> > + int status;
> > +
> > + do {
> > + struct folio *folio;
> > + size_t offset;
> > + bool ret;
>
> Is balance_dirty_pages_ratelimited_flags need here if we're at the dirty
> thresholds?
Ah I see, missed that, I will add it, thanks!
>
> > +
> > + bytes = min_t(u64, SIZE_MAX, bytes);
> > + status = iomap_write_begin(iter, &folio, &offset, &bytes);
> > + if (status)
> > + return status;
> > + if (iter->iomap.flags & IOMAP_F_STALE)
> > + break;
> > +
> > + offset = offset_in_folio(folio, pos);
> > + if (bytes > folio_size(folio) - offset)
> > + bytes = folio_size(folio) - offset;
> > +
> > + memcpy_to_folio(folio, offset, buf, bytes);
> > +
> > + ret = iomap_write_end(iter, bytes, bytes, folio);
> > + if (WARN_ON_ONCE(!ret))
> > + return -EIO;
> > +
> > + __iomap_put_folio(iter, bytes, folio);
> > + if (WARN_ON_ONCE(!ret))
> > + return -EIO;
> > +
> > + status = iomap_iter_advance(iter, &bytes);
> > + if (status)
> > + break;
> > + } while (bytes > 0);
> > +
> > + return status;
> > +}
>
> Hrm, stripped down version of iomap_write_iter without the isize
> updates.
yes, it probably can be merged with a flag to skip the i_size
updates, I can look into this if it make more sense
>
> --D
>
> > +
> > +int
> > +iomap_write_region(struct ioregion *region)
> > +{
> > + struct iomap_iter iter = {
> > + .inode = region->inode,
> > + .pos = region->pos,
> > + .len = region->length,
> > + };
> > + ssize_t ret;
> > +
> > + while ((ret = iomap_iter(&iter, region->ops)) > 0)
> > + iter.status = iomap_write_region_iter(&iter, region->buf);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_write_region);
>
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck
2025-07-31 14:54 ` Andrey Albershteyn
@ 2025-07-31 16:03 ` Carlos Maiolino
0 siblings, 0 replies; 75+ messages in thread
From: Carlos Maiolino @ 2025-07-31 16:03 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Darrick J. Wong, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers, hch, Andrey Albershteyn
On Thu, Jul 31, 2025 at 04:54:46PM +0200, Andrey Albershteyn wrote:
> On 2025-07-29 08:28:39, Darrick J. Wong wrote:
> > On Mon, Jul 28, 2025 at 10:30:30PM +0200, Andrey Albershteyn wrote:
> > > The quotacheck doesn't initialize sc->ip.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> >
> > Looks good,
> > Cc: <stable@vger.kernel.org> # v6.8
> > Fixes: 21d7500929c8a0 ("xfs: improve dquot iteration for scrub")
> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
>
> @cem, could pick this one? Or if you want I can resend it separately
> with tags
Yes, please, send it again, without the RFC tag, with the rest of
shenanigans :)
>
> --
> - Andrey
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF
2025-07-28 20:30 ` [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF Andrey Albershteyn
2025-07-29 22:07 ` Darrick J. Wong
@ 2025-07-31 18:43 ` Joanne Koong
2025-08-04 11:34 ` Andrey Albershteyn
1 sibling, 1 reply; 75+ messages in thread
From: Joanne Koong @ 2025-07-31 18:43 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 1:31 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
>
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Add iomap_writepages_unbound() without limit in form of EOF. XFS
> will use this to write metadata (fs-verity Merkle tree) in range far
> beyond EOF.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
> fs/iomap/buffered-io.c | 51 +++++++++++++++++++++++++++++++++++++++-----------
> include/linux/iomap.h | 3 +++
> 2 files changed, 43 insertions(+), 11 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3729391a18f3..7bef232254a3 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1881,18 +1881,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> int error = 0;
> u32 rlen;
>
> - WARN_ON_ONCE(!folio_test_locked(folio));
> - WARN_ON_ONCE(folio_test_dirty(folio));
> - WARN_ON_ONCE(folio_test_writeback(folio));
> -
> - trace_iomap_writepage(inode, pos, folio_size(folio));
> -
> - if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> - folio_unlock(folio);
> - return 0;
> - }
> WARN_ON_ONCE(end_pos <= pos);
>
> + trace_iomap_writepage(inode, pos, folio_size(folio));
> +
> if (i_blocks_per_folio(inode, folio) > 1) {
> if (!ifs) {
> ifs = ifs_alloc(inode, folio, 0);
> @@ -1956,6 +1948,23 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> return error;
> }
>
> +/* Map pages bound by EOF */
> +static int iomap_writepage_map_eof(struct iomap_writepage_ctx *wpc,
> + struct writeback_control *wbc, struct folio *folio)
> +{
> + int error;
> + struct inode *inode = folio->mapping->host;
> + u64 end_pos = folio_pos(folio) + folio_size(folio);
> +
> + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> + folio_unlock(folio);
> + return 0;
> + }
> +
> + error = iomap_writepage_map(wpc, wbc, folio);
> + return error;
> +}
> +
> int
> iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> struct iomap_writepage_ctx *wpc,
> @@ -1972,9 +1981,29 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> PF_MEMALLOC))
> return -EIO;
>
> + wpc->ops = ops;
> + while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> + WARN_ON_ONCE(!folio_test_locked(folio));
> + WARN_ON_ONCE(folio_test_dirty(folio));
> + WARN_ON_ONCE(folio_test_writeback(folio));
> +
> + error = iomap_writepage_map_eof(wpc, wbc, folio);
> + }
> + return iomap_submit_ioend(wpc, error);
> +}
> +EXPORT_SYMBOL_GPL(iomap_writepages);
> +
> +int
> +iomap_writepages_unbound(struct address_space *mapping, struct writeback_control *wbc,
> + struct iomap_writepage_ctx *wpc,
> + const struct iomap_writeback_ops *ops)
> +{
> + struct folio *folio = NULL;
> + int error;
> +
> wpc->ops = ops;
> while ((folio = writeback_iter(mapping, wbc, folio, &error)))
> error = iomap_writepage_map(wpc, wbc, folio);
> return iomap_submit_ioend(wpc, error);
> }
> -EXPORT_SYMBOL_GPL(iomap_writepages);
> +EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 522644d62f30..4a0b5ebb79e9 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -464,6 +464,9 @@ void iomap_sort_ioends(struct list_head *ioend_list);
> int iomap_writepages(struct address_space *mapping,
> struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> const struct iomap_writeback_ops *ops);
> +int iomap_writepages_unbound(struct address_space *mapping,
> + struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> + const struct iomap_writeback_ops *ops);
>
Just curious, instead of having a new api for
iomap_writepages_unbound, does adding a bitfield for unbound to the
iomap_writepage_ctx struct suffice? afaict, the logic between the two
paths is identical except for the iomap_writepage_handle_eof() call
and some WARN_ONs - if that gets gated behind the bitfield check, then
it seems like it does the same thing logically but imo is more
straightforward to follow the code flow of. But maybe I"m missing some
reason why this wouldn't work?
> /*
> * Flags for direct I/O ->end_io:
>
> --
> 2.50.0
>
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF
2025-07-31 18:43 ` Joanne Koong
@ 2025-08-04 11:34 ` Andrey Albershteyn
0 siblings, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-08-04 11:34 UTC (permalink / raw)
To: Joanne Koong
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch,
Andrey Albershteyn
On 2025-07-31 11:43:52, Joanne Koong wrote:
> On Mon, Jul 28, 2025 at 1:31 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> >
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > Add iomap_writepages_unbound() without limit in form of EOF. XFS
> > will use this to write metadata (fs-verity Merkle tree) in range far
> > beyond EOF.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/iomap/buffered-io.c | 51 +++++++++++++++++++++++++++++++++++++++-----------
> > include/linux/iomap.h | 3 +++
> > 2 files changed, 43 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 3729391a18f3..7bef232254a3 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1881,18 +1881,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > int error = 0;
> > u32 rlen;
> >
> > - WARN_ON_ONCE(!folio_test_locked(folio));
> > - WARN_ON_ONCE(folio_test_dirty(folio));
> > - WARN_ON_ONCE(folio_test_writeback(folio));
> > -
> > - trace_iomap_writepage(inode, pos, folio_size(folio));
> > -
> > - if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> > - folio_unlock(folio);
> > - return 0;
> > - }
> > WARN_ON_ONCE(end_pos <= pos);
> >
> > + trace_iomap_writepage(inode, pos, folio_size(folio));
> > +
> > if (i_blocks_per_folio(inode, folio) > 1) {
> > if (!ifs) {
> > ifs = ifs_alloc(inode, folio, 0);
> > @@ -1956,6 +1948,23 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > return error;
> > }
> >
> > +/* Map pages bound by EOF */
> > +static int iomap_writepage_map_eof(struct iomap_writepage_ctx *wpc,
> > + struct writeback_control *wbc, struct folio *folio)
> > +{
> > + int error;
> > + struct inode *inode = folio->mapping->host;
> > + u64 end_pos = folio_pos(folio) + folio_size(folio);
> > +
> > + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> > + folio_unlock(folio);
> > + return 0;
> > + }
> > +
> > + error = iomap_writepage_map(wpc, wbc, folio);
> > + return error;
> > +}
> > +
> > int
> > iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> > struct iomap_writepage_ctx *wpc,
> > @@ -1972,9 +1981,29 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> > PF_MEMALLOC))
> > return -EIO;
> >
> > + wpc->ops = ops;
> > + while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> > + WARN_ON_ONCE(!folio_test_locked(folio));
> > + WARN_ON_ONCE(folio_test_dirty(folio));
> > + WARN_ON_ONCE(folio_test_writeback(folio));
> > +
> > + error = iomap_writepage_map_eof(wpc, wbc, folio);
> > + }
> > + return iomap_submit_ioend(wpc, error);
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_writepages);
> > +
> > +int
> > +iomap_writepages_unbound(struct address_space *mapping, struct writeback_control *wbc,
> > + struct iomap_writepage_ctx *wpc,
> > + const struct iomap_writeback_ops *ops)
> > +{
> > + struct folio *folio = NULL;
> > + int error;
> > +
> > wpc->ops = ops;
> > while ((folio = writeback_iter(mapping, wbc, folio, &error)))
> > error = iomap_writepage_map(wpc, wbc, folio);
> > return iomap_submit_ioend(wpc, error);
> > }
> > -EXPORT_SYMBOL_GPL(iomap_writepages);
> > +EXPORT_SYMBOL_GPL(iomap_writepages_unbound);
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 522644d62f30..4a0b5ebb79e9 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -464,6 +464,9 @@ void iomap_sort_ioends(struct list_head *ioend_list);
> > int iomap_writepages(struct address_space *mapping,
> > struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> > const struct iomap_writeback_ops *ops);
> > +int iomap_writepages_unbound(struct address_space *mapping,
> > + struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> > + const struct iomap_writeback_ops *ops);
> >
>
> Just curious, instead of having a new api for
> iomap_writepages_unbound, does adding a bitfield for unbound to the
> iomap_writepage_ctx struct suffice? afaict, the logic between the two
> paths is identical except for the iomap_writepage_handle_eof() call
> and some WARN_ONs - if that gets gated behind the bitfield check, then
> it seems like it does the same thing logically but imo is more
> straightforward to follow the code flow of. But maybe I"m missing some
> reason why this wouldn't work?
Yeah, that's another option. If others also prefer flag then I can
change it.
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface
2025-07-29 22:22 ` Darrick J. Wong
2025-07-31 15:51 ` Andrey Albershteyn
@ 2025-08-11 11:43 ` Christoph Hellwig
1 sibling, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-11 11:43 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andrey Albershteyn, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers, hch, Andrey Albershteyn
On Tue, Jul 29, 2025 at 03:22:52PM -0700, Darrick J. Wong wrote:
> ...and these sound a lot like filemap_read and iomap_write_iter.
> Why not use those? You'd get readahead for free. Though I guess
> filemap_read cuts off at i_size so maybe that's why this is necessary?
>
> (and by extension, is this why the existing fsverity implementations
> seem to do their own readahead and reading?)
>
> ((and now I guess I see why this isn't done through the regular kiocb
> interface, because then we'd be exposing post-EOF data hiding to
> everyone in the system))
Same thoughts here. It seems like we should just have a beyond-EOF or
fsverity flag for ->read_iter / ->write_iter and consolidate all this
code. That'll also go along nicely with the flag in the writepage_ctx
suggested by Joanne.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing
2025-07-28 20:30 ` [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing Andrey Albershteyn
@ 2025-08-11 11:45 ` Christoph Hellwig
2025-08-11 17:51 ` Tejun Heo
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-11 11:45 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch,
Tejun Heo, Lai Jiangshan, linux-kernel
On Mon, Jul 28, 2025 at 10:30:08PM +0200, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> For XFS, fsverity's global workqueue is not really suitable due to:
>
> 1. High priority workqueues are used within XFS to ensure that data
> IO completion cannot stall processing of journal IO completions.
> Hence using a WQ_HIGHPRI workqueue directly in the user data IO
> path is a potential filesystem livelock/deadlock vector.
Do they? I though the whole point of WQ_HIGHPRI was that they'd
have separate rescue workers to avoid any global pool effects.
> 2. The fsverity workqueue is global - it creates a cross-filesystem
> contention point.
How does this not affect the other file systems?
If the global workqueue is such an issue, maybe it should be addressed
in an initial series before the xfs support?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem
2025-07-28 20:30 ` [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem Andrey Albershteyn
@ 2025-08-11 11:46 ` Christoph Hellwig
2025-08-11 15:31 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-11 11:46 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
On Mon, Jul 28, 2025 at 10:30:10PM +0200, Andrey Albershteyn wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
>
> Provide a new function call so that validation errors can be reported
> back to the filesystem.
This feels like an awfull generic name for a very specific error
condition.
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Btw, this is missing a signoff from Andrey as the series submitter
needs to sign off after the author.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers
2025-07-28 20:30 ` [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers Andrey Albershteyn
@ 2025-08-11 11:48 ` Christoph Hellwig
2025-08-11 15:38 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-11 11:48 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch
On Mon, Jul 28, 2025 at 10:30:16PM +0200, Andrey Albershteyn wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
>
> Create a function that will return selected information about the
> geometry of the merkle tree. Online fsck for XFS will need this piece
> to perform basic checks of the merkle tree.
Just curious, why does xfs need this, but the existing file systems
don't? That would be some good background information for the commit
message.
> + if (!IS_VERITY(inode))
> + return -ENODATA;
> +
> + error = ensure_verity_info(inode);
> + if (error)
> + return error;
> +
> + vi = inode->i_verity_info;
Wouldn't it be a better interface to return the verity_ino from
ensure_verity_info (NULL for !IS_VERITY, ERR_PTR for real error)
and then just look at the fields directly?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 14/29] xfs: add attribute type for fs-verity
2025-07-28 20:30 ` [PATCH RFC 14/29] xfs: add attribute type for fs-verity Andrey Albershteyn
@ 2025-08-11 11:50 ` Christoph Hellwig
2025-08-11 19:00 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-11 11:50 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:18PM +0200, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> The fsverity descriptor is stored in the extended attributes of the
> inode. Add new attribute type for fs-verity metadata. Add
> XFS_ATTR_INTERNAL_MASK to skip parent pointer and fs-verity attributes
> as those are only for internal use. While we're at it add a few comments
> in relevant places that internally visible attributes are not suppose to
> be handled via interface defined in xfs_xattr.c.
So ext4 and other seems to place the descriptor just before the verity
data. What is the benefit of an attr?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem
2025-08-11 11:46 ` Christoph Hellwig
@ 2025-08-11 15:31 ` Darrick J. Wong
2025-08-12 7:34 ` Christoph Hellwig
0 siblings, 1 reply; 75+ messages in thread
From: Darrick J. Wong @ 2025-08-11 15:31 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers
On Mon, Aug 11, 2025 at 01:46:03PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 28, 2025 at 10:30:10PM +0200, Andrey Albershteyn wrote:
> > From: "Darrick J. Wong" <djwong@kernel.org>
> >
> > Provide a new function call so that validation errors can be reported
> > back to the filesystem.
>
> This feels like an awfull generic name for a very specific error
> condition.
<shrug> ->verity_failure?
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>
> Btw, this is missing a signoff from Andrey as the series submitter
> needs to sign off after the author.
Yes.
--D
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers
2025-08-11 11:48 ` Christoph Hellwig
@ 2025-08-11 15:38 ` Darrick J. Wong
2025-08-11 19:06 ` Andrey Albershteyn
2025-08-12 7:42 ` Christoph Hellwig
0 siblings, 2 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-08-11 15:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers
On Mon, Aug 11, 2025 at 01:48:13PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 28, 2025 at 10:30:16PM +0200, Andrey Albershteyn wrote:
> > From: "Darrick J. Wong" <djwong@kernel.org>
> >
> > Create a function that will return selected information about the
> > geometry of the merkle tree. Online fsck for XFS will need this piece
> > to perform basic checks of the merkle tree.
>
> Just curious, why does xfs need this, but the existing file systems
> don't? That would be some good background information for the commit
> message.
Hrmmm... the last time I sent this RFC, online fsck used it to check the
validity of the merkle tree xattrs.
I think you could also use it to locate the merkle tree at the highest
possible offset in the data fork, though IIRC Andrey decided to pin it
at 1<<53.
(I think ext4 just opencodes the logic everywhere...)
> > + if (!IS_VERITY(inode))
> > + return -ENODATA;
> > +
> > + error = ensure_verity_info(inode);
> > + if (error)
> > + return error;
> > +
> > + vi = inode->i_verity_info;
>
> Wouldn't it be a better interface to return the verity_ino from
> ensure_verity_info (NULL for !IS_VERITY, ERR_PTR for real error)
> and then just look at the fields directly?
They're private to fsverity_private.h.
--D
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing
2025-08-11 11:45 ` Christoph Hellwig
@ 2025-08-11 17:51 ` Tejun Heo
2025-08-12 7:43 ` Christoph Hellwig
0 siblings, 1 reply; 75+ messages in thread
From: Tejun Heo @ 2025-08-11 17:51 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-fsdevel, linux-xfs, david,
djwong, ebiggers, Lai Jiangshan, linux-kernel
Hello,
On Mon, Aug 11, 2025 at 01:45:19PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 28, 2025 at 10:30:08PM +0200, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > For XFS, fsverity's global workqueue is not really suitable due to:
> >
> > 1. High priority workqueues are used within XFS to ensure that data
> > IO completion cannot stall processing of journal IO completions.
> > Hence using a WQ_HIGHPRI workqueue directly in the user data IO
> > path is a potential filesystem livelock/deadlock vector.
>
> Do they? I though the whole point of WQ_HIGHPRI was that they'd
> have separate rescue workers to avoid any global pool effects.
HIGHPRI and MEM_RECLAIM are orthogonal. HIGHPRI makes the workqueue use
worker pools with high priority, so all work items would execute at MIN_NICE
(-20). Hmm... actually, rescuer doesn't set priority according to the
workqueue's, which seems buggy.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 14/29] xfs: add attribute type for fs-verity
2025-08-11 11:50 ` Christoph Hellwig
@ 2025-08-11 19:00 ` Andrey Albershteyn
2025-08-12 7:44 ` Christoph Hellwig
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-08-11 19:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers,
Andrey Albershteyn
On 2025-08-11 13:50:23, Christoph Hellwig wrote:
> On Mon, Jul 28, 2025 at 10:30:18PM +0200, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > The fsverity descriptor is stored in the extended attributes of the
> > inode. Add new attribute type for fs-verity metadata. Add
> > XFS_ATTR_INTERNAL_MASK to skip parent pointer and fs-verity attributes
> > as those are only for internal use. While we're at it add a few comments
> > in relevant places that internally visible attributes are not suppose to
> > be handled via interface defined in xfs_xattr.c.
>
> So ext4 and other seems to place the descriptor just before the verity
> data. What is the benefit of an attr?
>
Mostly because it was already implemented. But looking for benefits,
attr can be inode LOCAL so a bit of saved space? Also, seems like a
better interface than to look at a magic offset
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers
2025-08-11 15:38 ` Darrick J. Wong
@ 2025-08-11 19:06 ` Andrey Albershteyn
2025-08-12 7:42 ` Christoph Hellwig
1 sibling, 0 replies; 75+ messages in thread
From: Andrey Albershteyn @ 2025-08-11 19:06 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers
On 2025-08-11 08:38:22, Darrick J. Wong wrote:
> On Mon, Aug 11, 2025 at 01:48:13PM +0200, Christoph Hellwig wrote:
> > On Mon, Jul 28, 2025 at 10:30:16PM +0200, Andrey Albershteyn wrote:
> > > From: "Darrick J. Wong" <djwong@kernel.org>
> > >
> > > Create a function that will return selected information about the
> > > geometry of the merkle tree. Online fsck for XFS will need this piece
> > > to perform basic checks of the merkle tree.
> >
> > Just curious, why does xfs need this, but the existing file systems
> > don't? That would be some good background information for the commit
> > message.
>
> Hrmmm... the last time I sent this RFC, online fsck used it to check the
> validity of the merkle tree xattrs.
>
> I think you could also use it to locate the merkle tree at the highest
> possible offset in the data fork, though IIRC Andrey decided to pin it
> at 1<<53.
I also use it in a few places to get tree_size which used to adjust
the read size (xfs_fsverity_adjust_read and
iomap_fsverity_tree_end_align).
>
> (I think ext4 just opencodes the logic everywhere...)
>
> > > + if (!IS_VERITY(inode))
> > > + return -ENODATA;
> > > +
> > > + error = ensure_verity_info(inode);
> > > + if (error)
> > > + return error;
> > > +
> > > + vi = inode->i_verity_info;
> >
> > Wouldn't it be a better interface to return the verity_ino from
> > ensure_verity_info (NULL for !IS_VERITY, ERR_PTR for real error)
> > and then just look at the fields directly?
>
> They're private to fsverity_private.h.
>
> --D
>
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem
2025-08-11 15:31 ` Darrick J. Wong
@ 2025-08-12 7:34 ` Christoph Hellwig
2025-08-12 7:56 ` Christoph Hellwig
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-12 7:34 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, fsverity, linux-fsdevel,
linux-xfs, david, ebiggers
On Mon, Aug 11, 2025 at 08:31:42AM -0700, Darrick J. Wong wrote:
> On Mon, Aug 11, 2025 at 01:46:03PM +0200, Christoph Hellwig wrote:
> > On Mon, Jul 28, 2025 at 10:30:10PM +0200, Andrey Albershteyn wrote:
> > > From: "Darrick J. Wong" <djwong@kernel.org>
> > >
> > > Provide a new function call so that validation errors can be reported
> > > back to the filesystem.
> >
> > This feels like an awfull generic name for a very specific error
> > condition.
>
> <shrug> ->verity_failure?
Either that or make it generic. Either way it needs to be documented
in the usual places that document file operations including explaining
when it is supposed to be called.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers
2025-08-11 15:38 ` Darrick J. Wong
2025-08-11 19:06 ` Andrey Albershteyn
@ 2025-08-12 7:42 ` Christoph Hellwig
2025-08-12 19:09 ` Darrick J. Wong
1 sibling, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-12 7:42 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, fsverity, linux-fsdevel,
linux-xfs, david, ebiggers
On Mon, Aug 11, 2025 at 08:38:22AM -0700, Darrick J. Wong wrote:
> > Just curious, why does xfs need this, but the existing file systems
> > don't? That would be some good background information for the commit
> > message.
>
> Hrmmm... the last time I sent this RFC, online fsck used it to check the
> validity of the merkle tree xattrs.
I saw a few users, so it does get used. But patches exporting something
should in generaly document what the use case is.
> > > + if (!IS_VERITY(inode))
> > > + return -ENODATA;
> > > +
> > > + error = ensure_verity_info(inode);
> > > + if (error)
> > > + return error;
> > > +
> > > + vi = inode->i_verity_info;
> >
> > Wouldn't it be a better interface to return the verity_ino from
> > ensure_verity_info (NULL for !IS_VERITY, ERR_PTR for real error)
> > and then just look at the fields directly?
>
> They're private to fsverity_private.h.
Indeed. Is ensure_verity_info ven the right thing here? I.e.
should quering the paramters create the info if it wasn't there
yet?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing
2025-08-11 17:51 ` Tejun Heo
@ 2025-08-12 7:43 ` Christoph Hellwig
2025-08-12 19:52 ` Tejun Heo
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-12 7:43 UTC (permalink / raw)
To: Tejun Heo
Cc: Christoph Hellwig, Andrey Albershteyn, fsverity, linux-fsdevel,
linux-xfs, david, djwong, ebiggers, Lai Jiangshan, linux-kernel
> On Mon, Aug 11, 2025 at 01:45:19PM +0200, Christoph Hellwig wrote:
> > On Mon, Jul 28, 2025 at 10:30:08PM +0200, Andrey Albershteyn wrote:
> > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > >
> > > For XFS, fsverity's global workqueue is not really suitable due to:
> > >
> > > 1. High priority workqueues are used within XFS to ensure that data
> > > IO completion cannot stall processing of journal IO completions.
> > > Hence using a WQ_HIGHPRI workqueue directly in the user data IO
> > > path is a potential filesystem livelock/deadlock vector.
> >
> > Do they? I though the whole point of WQ_HIGHPRI was that they'd
> > have separate rescue workers to avoid any global pool effects.
>
> HIGHPRI and MEM_RECLAIM are orthogonal. HIGHPRI makes the workqueue use
> worker pools with high priority, so all work items would execute at MIN_NICE
> (-20). Hmm... actually, rescuer doesn't set priority according to the
> workqueue's, which seems buggy.
Andrey (or others involved with previous versions): is interference
with the log completion workqueue what you ran into?
Tejun, are you going to prepare a patch to fix the rescuer priority?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 14/29] xfs: add attribute type for fs-verity
2025-08-11 19:00 ` Andrey Albershteyn
@ 2025-08-12 7:44 ` Christoph Hellwig
2025-08-12 17:11 ` Andrey Albershteyn
0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-12 7:44 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, fsverity, linux-fsdevel, linux-xfs, david,
djwong, ebiggers, Andrey Albershteyn
On Mon, Aug 11, 2025 at 09:00:29PM +0200, Andrey Albershteyn wrote:
> Mostly because it was already implemented. But looking for benefits,
> attr can be inode LOCAL so a bit of saved space? Also, seems like a
> better interface than to look at a magic offset
Well, can you document the rationale somewhere?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files
2025-07-28 20:30 ` [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
2025-07-29 9:53 ` Amir Goldstein
@ 2025-08-12 7:51 ` Christoph Hellwig
1 sibling, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-12 7:51 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers, hch,
Andrey Albershteyn
On Mon, Jul 28, 2025 at 10:30:07PM +0200, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
> enabled.
This is independent of the XFS support and just offers the verity
bit over formally XFS and now VFS fsxattr interface, right? If so
can you just sent it off to Eric after writing a more complete
commit log explaining this?
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem
2025-08-12 7:34 ` Christoph Hellwig
@ 2025-08-12 7:56 ` Christoph Hellwig
0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2025-08-12 7:56 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, fsverity, linux-fsdevel,
linux-xfs, david, ebiggers
On Tue, Aug 12, 2025 at 09:34:55AM +0200, Christoph Hellwig wrote:
> On Mon, Aug 11, 2025 at 08:31:42AM -0700, Darrick J. Wong wrote:
> > On Mon, Aug 11, 2025 at 01:46:03PM +0200, Christoph Hellwig wrote:
> > > On Mon, Jul 28, 2025 at 10:30:10PM +0200, Andrey Albershteyn wrote:
> > > > From: "Darrick J. Wong" <djwong@kernel.org>
> > > >
> > > > Provide a new function call so that validation errors can be reported
> > > > back to the filesystem.
> > >
> > > This feels like an awfull generic name for a very specific error
> > > condition.
> >
> > <shrug> ->verity_failure?
>
> Either that or make it generic. Either way it needs to be documented
> in the usual places that document file operations including explaining
> when it is supposed to be called.
FYI, I only realized now this is actually a fsverity operation, not
a file operation. Sorry for the noise.
So maybe file_corrupt is fine, but verity_failure sounds fine to.
Maybe the fsverity maintainers have an opinion.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 14/29] xfs: add attribute type for fs-verity
2025-08-12 7:44 ` Christoph Hellwig
@ 2025-08-12 17:11 ` Andrey Albershteyn
2025-08-12 19:12 ` Darrick J. Wong
0 siblings, 1 reply; 75+ messages in thread
From: Andrey Albershteyn @ 2025-08-12 17:11 UTC (permalink / raw)
To: Christoph Hellwig
Cc: fsverity, linux-fsdevel, linux-xfs, david, djwong, ebiggers,
Andrey Albershteyn
On 2025-08-12 09:44:15, Christoph Hellwig wrote:
> On Mon, Aug 11, 2025 at 09:00:29PM +0200, Andrey Albershteyn wrote:
> > Mostly because it was already implemented. But looking for benefits,
> > attr can be inode LOCAL so a bit of saved space? Also, seems like a
> > better interface than to look at a magic offset
>
> Well, can you document the rationale somewhere?
>
We discussed this a bit with Darrick, and it probably makes more
sense to have descriptor in data fork if fscrypt is added. As
descriptor has root hash of the tree (and on small files this is
just a file hash), and fscrypt expects merkle tree to be encrpyted
as it's hash of plaintext data.
--
- Andrey
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers
2025-08-12 7:42 ` Christoph Hellwig
@ 2025-08-12 19:09 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-08-12 19:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers
On Tue, Aug 12, 2025 at 09:42:08AM +0200, Christoph Hellwig wrote:
> On Mon, Aug 11, 2025 at 08:38:22AM -0700, Darrick J. Wong wrote:
> > > Just curious, why does xfs need this, but the existing file systems
> > > don't? That would be some good background information for the commit
> > > message.
> >
> > Hrmmm... the last time I sent this RFC, online fsck used it to check the
> > validity of the merkle tree xattrs.
>
> I saw a few users, so it does get used. But patches exporting something
> should in generaly document what the use case is.
>
> > > > + if (!IS_VERITY(inode))
> > > > + return -ENODATA;
> > > > +
> > > > + error = ensure_verity_info(inode);
> > > > + if (error)
> > > > + return error;
> > > > +
> > > > + vi = inode->i_verity_info;
> > >
> > > Wouldn't it be a better interface to return the verity_ino from
> > > ensure_verity_info (NULL for !IS_VERITY, ERR_PTR for real error)
> > > and then just look at the fields directly?
> >
> > They're private to fsverity_private.h.
>
> Indeed. Is ensure_verity_info ven the right thing here? I.e.
> should quering the paramters create the info if it wasn't there
> yet?
I think it's usually the case that we're about to access the merkle tree
anyway, so the next step in whatever we're doing would load it for us.
--D
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 14/29] xfs: add attribute type for fs-verity
2025-08-12 17:11 ` Andrey Albershteyn
@ 2025-08-12 19:12 ` Darrick J. Wong
0 siblings, 0 replies; 75+ messages in thread
From: Darrick J. Wong @ 2025-08-12 19:12 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, fsverity, linux-fsdevel, linux-xfs, david,
ebiggers, Andrey Albershteyn
On Tue, Aug 12, 2025 at 07:11:24PM +0200, Andrey Albershteyn wrote:
> On 2025-08-12 09:44:15, Christoph Hellwig wrote:
> > On Mon, Aug 11, 2025 at 09:00:29PM +0200, Andrey Albershteyn wrote:
> > > Mostly because it was already implemented. But looking for benefits,
> > > attr can be inode LOCAL so a bit of saved space? Also, seems like a
> > > better interface than to look at a magic offset
> >
> > Well, can you document the rationale somewhere?
> >
>
> We discussed this a bit with Darrick, and it probably makes more
> sense to have descriptor in data fork if fscrypt is added. As
> descriptor has root hash of the tree (and on small files this is
> just a file hash), and fscrypt expects merkle tree to be encrpyted
> as it's hash of plaintext data.
To cite my own sources, the last Q in the Q&A in
https://docs.kernel.org/filesystems/fsverity.html#faq
states that:
"ext4 and f2fs encryption doesn’t encrypt xattrs, yet the Merkle tree
must be encrypted when the file contents are, because it stores hashes
of the plaintext file contents."
So on the grounds that we're following the ext4/f2fs merkle tree layout
model to keep our options open for fscrypt later, I think we need the
verity descriptor to be in the posteof file data, not an xattr.
--D
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing
2025-08-12 7:43 ` Christoph Hellwig
@ 2025-08-12 19:52 ` Tejun Heo
0 siblings, 0 replies; 75+ messages in thread
From: Tejun Heo @ 2025-08-12 19:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, fsverity, linux-fsdevel, linux-xfs, david,
djwong, ebiggers, Lai Jiangshan, linux-kernel
Hello,
On Tue, Aug 12, 2025 at 09:43:50AM +0200, Christoph Hellwig wrote:
...
> Andrey (or others involved with previous versions): is interference
> with the log completion workqueue what you ran into?
>
> Tejun, are you going to prepare a patch to fix the rescuer priority?
NVM, I was confused. All rescuers, regardless of the associated workqueue,
set their nice level to MIN_NICE. IIRC, the rationale was that by the time
rescuer triggers the queued work items already have experienced noticeable
latencies and that rescuer invocations would be pretty rare. I'd be
surprised if rescuer behavior is showing up as easily observable
interferences in most cases. The system should already be thrashing quite a
bit for rescuers to be active and whatever noise rescuer behavior might
cause should usually be drowned by other things.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 75+ messages in thread
end of thread, other threads:[~2025-08-12 19:52 UTC | newest]
Thread overview: 75+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-28 20:30 [PATCH RFC 00/29] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 01/29] iomap: add iomap_writepages_unbound() to write beyond EOF Andrey Albershteyn
2025-07-29 22:07 ` Darrick J. Wong
2025-07-31 15:04 ` Andrey Albershteyn
2025-07-31 18:43 ` Joanne Koong
2025-08-04 11:34 ` Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 02/29] iomap: introduce iomap_read/write_region interface Andrey Albershteyn
2025-07-29 22:22 ` Darrick J. Wong
2025-07-31 15:51 ` Andrey Albershteyn
2025-08-11 11:43 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 03/29] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
2025-07-29 9:53 ` Amir Goldstein
2025-07-29 10:35 ` Andrey Albershteyn
2025-07-29 12:06 ` Amir Goldstein
2025-08-12 7:51 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 04/29] fsverity: add per-sb workqueue for post read processing Andrey Albershteyn
2025-08-11 11:45 ` Christoph Hellwig
2025-08-11 17:51 ` Tejun Heo
2025-08-12 7:43 ` Christoph Hellwig
2025-08-12 19:52 ` Tejun Heo
2025-07-28 20:30 ` [PATCH RFC 05/29] fsverity: add tracepoints Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 06/29] fsverity: report validation errors back to the filesystem Andrey Albershteyn
2025-08-11 11:46 ` Christoph Hellwig
2025-08-11 15:31 ` Darrick J. Wong
2025-08-12 7:34 ` Christoph Hellwig
2025-08-12 7:56 ` Christoph Hellwig
2025-07-28 20:30 ` [PATCH RFC 07/29] fsverity: pass super_block to fsverity_enqueue_verify_work Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 08/29] ext4: use a per-superblock fsverity workqueue Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 09/29] f2fs: " Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 10/29] btrfs: " Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 11/29] fsverity: remove system-wide workqueue Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 12/29] fsverity: expose merkle tree geometry to callers Andrey Albershteyn
2025-08-11 11:48 ` Christoph Hellwig
2025-08-11 15:38 ` Darrick J. Wong
2025-08-11 19:06 ` Andrey Albershteyn
2025-08-12 7:42 ` Christoph Hellwig
2025-08-12 19:09 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 13/29] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
2025-07-29 23:21 ` Darrick J. Wong
2025-07-31 11:34 ` Andrey Albershteyn
2025-07-31 14:52 ` Darrick J. Wong
2025-07-31 15:01 ` Andrey Albershteyn
2025-07-31 15:08 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 14/29] xfs: add attribute type for fs-verity Andrey Albershteyn
2025-08-11 11:50 ` Christoph Hellwig
2025-08-11 19:00 ` Andrey Albershteyn
2025-08-12 7:44 ` Christoph Hellwig
2025-08-12 17:11 ` Andrey Albershteyn
2025-08-12 19:12 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 15/29] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 16/29] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 17/29] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 18/29] xfs: don't allow to enable DAX on fs-verity sealed inode Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 19/29] xfs: disable direct read path for fs-verity files Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 20/29] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
2025-07-29 22:27 ` Darrick J. Wong
2025-07-31 11:42 ` Andrey Albershteyn
2025-07-31 14:49 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 21/29] xfs: add writeback and iomap reading of Merkel tree pages Andrey Albershteyn
2025-07-29 22:33 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 22/29] xfs: add fs-verity support Andrey Albershteyn
2025-07-29 23:05 ` Darrick J. Wong
2025-07-31 14:50 ` Andrey Albershteyn
2025-07-31 15:07 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 23/29] xfs: add fs-verity ioctls Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 24/29] xfs: advertise fs-verity being available on filesystem Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 25/29] xfs: check and repair the verity inode flag state Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 26/29] xfs: fix scrub trace with null pointer in quotacheck Andrey Albershteyn
2025-07-29 15:28 ` Darrick J. Wong
2025-07-31 14:54 ` Andrey Albershteyn
2025-07-31 16:03 ` Carlos Maiolino
2025-07-28 20:30 ` [PATCH RFC 27/29] xfs: report verity failures through the health system Andrey Albershteyn
2025-07-28 20:30 ` [PATCH RFC 28/29] xfs: add fsverity traces Andrey Albershteyn
2025-07-29 23:06 ` Darrick J. Wong
2025-07-28 20:30 ` [PATCH RFC 29/29] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).