From: "Darrick J. Wong" <djwong@kernel.org>
To: Andrey Albershteyn <aalbersh@redhat.com>
Cc: fsverity@lists.linux.dev, linux-xfs@vger.kernel.org,
ebiggers@kernel.org, linux-fsdevel@vger.kernel.org,
aalbersh@kernel.org, david@fromorbit.com, hch@lst.de
Subject: Re: [PATCH v2 16/22] xfs: add fs-verity support
Date: Wed, 14 Jan 2026 08:40:22 -0800 [thread overview]
Message-ID: <20260114164022.GN15583@frogsfrogsfrogs> (raw)
In-Reply-To: <vtkbi6fc2hhacxj5rmxqfxiawg5iqabsoxmxosm3oawqtqrbv5@vniylcxwzkcv>
On Tue, Jan 13, 2026 at 07:32:06PM +0100, Andrey Albershteyn wrote:
> On 2026-01-12 15:05:48, Darrick J. Wong wrote:
> > On Mon, Jan 12, 2026 at 03:51:43PM +0100, Andrey Albershteyn wrote:
> > > Add integration with fs-verity. XFS stores fs-verity descriptor and
> > > Merkle tree in the inode data fork at offset file offset (1 << 53).
> > >
> > > The Merkle tree reading/writing is done through iomap interface. The
> > > data itself are read to the inode's page cache. When XFS reads from this
> > > region iomap doesn't call into fsverity to verify it against Merkle
> > > tree. For data, verification is done on BIO completion in a workqueue.
> > >
> > > When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> > > flag is set meaning that the Merkle tree is being build. The
> > > initialization ends with storing of verity descriptor and setting
> > > inode on-disk flag (XFS_DIFLAG2_VERITY).
<snip some of this stuff>
> > > +/*
> > > + * Try to remove all the fsverity metadata after a failed enablement.
> > > + */
> > > +static int
> > > +xfs_fsverity_delete_metadata(
> > > + struct xfs_inode *ip)
> > > +{
> > > + struct xfs_trans *tp;
> > > + struct xfs_mount *mp = ip->i_mount;
> > > + int error;
> > > +
> > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> > > + if (error)
> > > + return error;
> > > +
> > > + xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> >
> > The MMAPLOCK should be taken before the transaction allocation like
> > everything else in xfs.
>
> hmm, but is it needed for transaction? I meant to only wrap
> xfs_truncate_page() with mmaplock
The MMAPLOCK isn't needed to run the transaction, but the normal
resource acquisition order is:
i_rwsem (IOLOCK) -> mmap_invalidatelock (MMAPLOCK) -> log grant space
(aka xfs_trans_reserve) -> ILOCK.
By taking the MMAPLOCK after log space has been granted, you're now
risking an ABBA deadlock because another thread could hold the same
MMAPLOCK and is waiting for a log grant while the current thread took
the last of the log grant space and is now waiting for the MMAPLOCK.
Why are we calling xfs_truncate_page on the EOF page?
Should we hold the MMAPLOCK during the entire merkle tree construction
process so that page_mkwrite is blocked?
Also, should page_mkwrite return an error if either S_VERITY or
XFS_VERITY_CONSTRUCTION are set on the inode?
--D
> > > + error = xfs_truncate_page(ip, XFS_ISIZE(ip), NULL, NULL);
> > > + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > +
> > > + xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > + xfs_trans_ijoin(tp, ip, 0);
> > > +
> > > + /*
> > > + * We removing post EOF data, no need to update i_size
> > > + */
> > > + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip));
> > > + if (error)
> > > + goto err_cancel;
> > > +
> > > + error = xfs_trans_commit(tp);
> > > + if (error)
> > > + goto err_cancel;
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +
> > > + return error;
> > > +
> > > +err_cancel:
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > + xfs_trans_cancel(tp);
> > > + return error;
> > > +}
> > > +
> > > +
> > > +/*
> > > + * Prepare to enable fsverity by clearing old metadata.
> > > + */
> > > +static int
> > > +xfs_fsverity_begin_enable(
> > > + struct file *filp)
> > > +{
> > > + struct inode *inode = file_inode(filp);
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + int error;
> > > +
> > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > > +
> > > + if (IS_DAX(inode))
> > > + return -EINVAL;
> > > +
> > > + if (inode->i_size > XFS_FSVERITY_REGION_START)
> > > + return -EFBIG;
> > > +
> > > + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
> > > + return -EBUSY;
> > > +
> > > + error = xfs_qm_dqattach(ip);
> > > + if (error)
> > > + return error;
> > > +
> > > + /*
> > > + * Flush pagecache before building Merkle tree. Inode is locked and no
> > > + * further writes will happen to the file except fsverity metadata
> >
> > Don't we need to take the MMAPLOCK to prevent concurrent write faults?
>
> will add it
>
> >
> > > + */
> > > + error = filemap_write_and_wait(inode->i_mapping);
> > > + if (error)
> > > + return error;
> > > +
> > > + return xfs_fsverity_delete_metadata(ip);
> > > +}
> > > +
> > > +/*
> > > + * Complete (or fail) the process of enabling fsverity.
> > > + */
> > > +static int
> > > +xfs_fsverity_end_enable(
> > > + struct file *filp,
> > > + const void *desc,
> > > + size_t desc_size,
> > > + u64 merkle_tree_size)
> > > +{
> > > + struct inode *inode = file_inode(filp);
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + struct xfs_mount *mp = ip->i_mount;
> > > + struct xfs_trans *tp;
> > > + int error = 0;
> > > +
> > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
> > > +
> > > + /* fs-verity failed, just cleanup */
> > > + if (desc == NULL)
> > > + goto out;
> > > +
> > > + error = xfs_fsverity_write_descriptor(ip, desc, desc_size,
> > > + merkle_tree_size);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Wait for Merkle tree get written to disk before setting on-disk inode
> > > + * flag and clearing XFS_VERITY_CONSTRUCTION
> > > + */
> > > + error = filemap_write_and_wait(inode->i_mapping);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Set fsverity inode flag
> > > + */
> > > + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> > > + 0, 0, false, &tp);
> > > + if (error)
> > > + goto out;
> > > +
> > > + /*
> > > + * Ensure that we've persisted the verity information before we enable
> > > + * it on the inode and tell the caller we have sealed the inode.
> > > + */
> > > + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> > > +
> > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > > + xfs_trans_set_sync(tp);
> > > +
> > > + error = xfs_trans_commit(tp);
> > > + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +
> > > + if (!error)
> > > + inode->i_flags |= S_VERITY;
> > > +
> > > +out:
> > > + if (error) {
> > > + int error2;
> > > +
> > > + error2 = xfs_fsverity_delete_metadata(ip);
> > > + if (error2)
> > > + xfs_alert(ip->i_mount,
> > > +"ino 0x%llx failed to clean up new fsverity metadata, err %d",
> > > + ip->i_ino, error2);
> > > + }
> > > +
> > > + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
> > > + return error;
> > > +}
> > > +
> > > +/*
> > > + * Retrieve a merkle tree block.
> > > + */
> > > +static struct page *
> > > +xfs_fsverity_read_merkle(
> > > + struct inode *inode,
> > > + pgoff_t index,
> > > + unsigned long num_ra_pages)
> > > +{
> > > + struct folio *folio;
> > > + pgoff_t offset =
> > > + index | (XFS_FSVERITY_REGION_START >> PAGE_SHIFT);
> > > +
> > > + folio = __filemap_get_folio(inode->i_mapping, offset, FGP_ACCESSED, 0);
> > > + if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> > > + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, offset);
> > > +
> > > + if (!IS_ERR(folio))
> > > + folio_put(folio);
> > > + else if (num_ra_pages > 1)
> > > + page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
> > > + folio = read_mapping_folio(inode->i_mapping, offset, NULL);
> > > + if (IS_ERR(folio))
> > > + return ERR_CAST(folio);
> > > + }
> > > + return folio_file_page(folio, offset);
> >
> > Shouldn't this be some _BEYOND_EOF variant of generic_file_read_iter?
>
> _BEYOND_EOF is set by xfs_iomap_read_begin() while mapping the
> blocks, and also this uses _unbounded version of readhead to skip
> EOF checks.
>
> >
> > > +}
> > > +
> > > +/*
> > > + * Write a merkle tree block.
> > > + */
> > > +static int
> > > +xfs_fsverity_write_merkle(
> > > + struct inode *inode,
> > > + const void *buf,
> > > + u64 pos,
> > > + unsigned int size)
> > > +{
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > + loff_t position = pos | XFS_FSVERITY_REGION_START;
> > > +
> > > + if (position + size > inode->i_sb->s_maxbytes)
> > > + return -EFBIG;
> > > +
> > > + return xfs_fsverity_write(ip, position, size, buf);
> > > +}
> > > +
> > > +const ptrdiff_t info_offs = (int)offsetof(struct xfs_inode, i_verity_info) -
> > > + (int)offsetof(struct xfs_inode, i_vnode);
> >
> > I ... wow.
> >
> > Not blaming you for writing this, just surprised that the common code
> > makes you do that.
> >
> > > +const struct fsverity_operations xfs_fsverity_ops = {
> > > + .inode_info_offs = info_offs,
> > > + .begin_enable_verity = xfs_fsverity_begin_enable,
> > > + .end_enable_verity = xfs_fsverity_end_enable,
> > > + .get_verity_descriptor = xfs_fsverity_get_descriptor,
> > > + .read_merkle_tree_page = xfs_fsverity_read_merkle,
> > > + .write_merkle_tree_block = xfs_fsverity_write_merkle,
> > > +};
> > > diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
> > > new file mode 100644
> > > index 0000000000..8b0d7ef456
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_fsverity.h
> > > @@ -0,0 +1,12 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * Copyright (C) 2022 Red Hat, Inc.
> > > + */
> > > +#ifndef __XFS_FSVERITY_H__
> > > +#define __XFS_FSVERITY_H__
> > > +
> > > +#ifdef CONFIG_FS_VERITY
> > > +extern const struct fsverity_operations xfs_fsverity_ops;
> > > +#endif /* CONFIG_FS_VERITY */
> > > +
> > > +#endif /* __XFS_FSVERITY_H__ */
> > > diff --git a/fs/xfs/xfs_message.c b/fs/xfs/xfs_message.c
> > > index 19aba2c3d5..17f0f0ca7b 100644
> > > --- a/fs/xfs/xfs_message.c
> > > +++ b/fs/xfs/xfs_message.c
> > > @@ -161,6 +161,10 @@
> > > .opstate = XFS_OPSTATE_WARNED_ZONED,
> > > .name = "zoned RT device",
> > > },
> > > + [XFS_EXPERIMENTAL_FSVERITY] = {
> > > + .opstate = XFS_OPSTATE_WARNED_ZONED,
> > > + .name = "fsverity",
> > > + },
> > > };
> > > ASSERT(feat >= 0 && feat < XFS_EXPERIMENTAL_MAX);
> > > BUILD_BUG_ON(ARRAY_SIZE(features) != XFS_EXPERIMENTAL_MAX);
> > > diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
> > > index d68e72379f..1647d32ea4 100644
> > > --- a/fs/xfs/xfs_message.h
> > > +++ b/fs/xfs/xfs_message.h
> > > @@ -96,6 +96,7 @@
> > > XFS_EXPERIMENTAL_LBS,
> > > XFS_EXPERIMENTAL_METADIR,
> > > XFS_EXPERIMENTAL_ZONED,
> > > + XFS_EXPERIMENTAL_FSVERITY,
> > >
> > > XFS_EXPERIMENTAL_MAX,
> > > };
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 10c6fc8d20..42a16b15a6 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -30,6 +30,7 @@
> > > #include "xfs_filestream.h"
> > > #include "xfs_quota.h"
> > > #include "xfs_sysfs.h"
> > > +#include "xfs_fsverity.h"
> > > #include "xfs_ondisk.h"
> > > #include "xfs_rmap_item.h"
> > > #include "xfs_refcount_item.h"
> > > @@ -54,6 +55,7 @@
> > > #include <linux/fs_context.h>
> > > #include <linux/fs_parser.h>
> > > #include <linux/fsverity.h>
> > > +#include <linux/iomap.h>
> > >
> > > static const struct super_operations xfs_super_operations;
> > >
> > > @@ -1706,6 +1708,9 @@
> > > sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> > > #endif
> > > sb->s_op = &xfs_super_operations;
> > > +#ifdef CONFIG_FS_VERITY
> > > + sb->s_vop = &xfs_fsverity_ops;
> > > +#endif
> > >
> > > /*
> > > * Delay mount work if the debug hook is set. This is debug
> > > @@ -1959,10 +1964,19 @@
> > > xfs_set_resuming_quotaon(mp);
> > > mp->m_qflags &= ~XFS_QFLAGS_MNTOPTS;
> > >
> > > + if (xfs_has_verity(mp))
> > > + xfs_warn_experimental(mp, XFS_EXPERIMENTAL_FSVERITY);
> > > +
> > > error = xfs_mountfs(mp);
> > > if (error)
> > > goto out_filestream_unmount;
> > >
> > > +#ifdef CONFIG_FS_VERITY
> > > + error = iomap_fsverity_init_bioset();
> >
> > if (xfs_has_verity()) ?
>
> sure, will change it
>
> --
> - Andrey
>
next prev parent reply other threads:[~2026-01-14 16:40 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-12 14:49 [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Andrey Albershteyn
2026-01-12 14:49 ` [PATCH v2 1/22] fsverity: report validation errors back to the filesystem Darrick J. Wong
2026-01-13 1:29 ` Darrick J. Wong
2026-01-13 8:09 ` Christoph Hellwig
2026-01-13 10:27 ` Andrey Albershteyn
2026-01-13 17:52 ` Darrick J. Wong
2026-01-12 14:49 ` [PATCH v2 2/22] fsverity: expose ensure_fsverity_info() Andrey Albershteyn
2026-01-12 22:05 ` Darrick J. Wong
2026-01-12 14:50 ` [PATCH v2 3/22] iomap: introduce IOMAP_F_BEYOND_EOF Andrey Albershteyn
2026-01-12 22:18 ` Darrick J. Wong
2026-01-12 22:31 ` Darrick J. Wong
2026-01-13 10:39 ` Andrey Albershteyn
2026-01-13 8:12 ` Christoph Hellwig
2026-01-13 10:50 ` Andrey Albershteyn
2026-01-13 16:22 ` Christoph Hellwig
2026-01-13 17:57 ` Darrick J. Wong
2026-01-16 21:52 ` Matthew Wilcox
2026-01-17 2:11 ` Darrick J. Wong
2026-01-12 14:50 ` [PATCH v2 4/22] iomap: allow iomap_file_buffered_write() take iocb without file Andrey Albershteyn
2026-01-12 22:22 ` Darrick J. Wong
2026-01-13 8:15 ` Christoph Hellwig
2026-01-13 10:53 ` Andrey Albershteyn
2026-01-13 16:43 ` Matthew Wilcox
2026-01-14 4:49 ` Matthew Wilcox
2026-01-14 6:41 ` Christoph Hellwig
2026-01-14 16:43 ` Darrick J. Wong
2026-01-12 14:50 ` [PATCH v2 5/22] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
2026-01-12 22:35 ` Darrick J. Wong
2026-01-13 11:16 ` Andrey Albershteyn
2026-01-13 16:23 ` Christoph Hellwig
2026-01-13 8:19 ` Christoph Hellwig
2026-01-12 14:50 ` [PATCH v2 6/22] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2026-01-12 14:50 ` [PATCH v2 7/22] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2026-01-12 14:50 ` [PATCH v2 8/22] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2026-01-12 14:50 ` [PATCH v2 9/22] xfs: don't allow to enable DAX on fs-verity sealed inode Andrey Albershteyn
2026-01-12 14:51 ` [PATCH v2 10/22] xfs: disable direct read path for fs-verity files Andrey Albershteyn
2026-01-13 8:20 ` Christoph Hellwig
2026-01-13 11:22 ` Andrey Albershteyn
2026-01-12 14:51 ` [PATCH v2 11/22] xfs: add verity info pointer to xfs inode Andrey Albershteyn
2026-01-12 22:39 ` Darrick J. Wong
2026-01-13 8:21 ` Christoph Hellwig
2026-01-13 18:02 ` Darrick J. Wong
2026-01-14 6:43 ` Christoph Hellwig
2026-01-12 14:51 ` [PATCH v2 12/22] xfs: introduce XFS_FSVERITY_CONSTRUCTION inode flag Andrey Albershteyn
2026-01-12 22:42 ` Darrick J. Wong
2026-01-13 11:24 ` Andrey Albershteyn
2026-01-12 14:51 ` [PATCH v2 13/22] xfs: introduce XFS_FSVERITY_REGION_START constant Andrey Albershteyn
2026-01-12 22:46 ` Darrick J. Wong
2026-01-13 12:23 ` Andrey Albershteyn
2026-01-13 18:06 ` Darrick J. Wong
2026-01-14 6:47 ` Christoph Hellwig
2026-01-14 7:59 ` Andrey Albershteyn
2026-01-14 16:50 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 14/22] xfs: disable preallocations for fsverity Merkle tree writes Andrey Albershteyn
2026-01-12 22:49 ` Darrick J. Wong
2026-01-12 14:51 ` [PATCH v2 15/22] xfs: add writeback and iomap reading of Merkle tree pages Andrey Albershteyn
2026-01-12 22:51 ` Darrick J. Wong
2026-01-13 8:23 ` Christoph Hellwig
2026-01-13 12:31 ` Andrey Albershteyn
2026-01-12 14:51 ` [PATCH v2 16/22] xfs: add fs-verity support Andrey Albershteyn
2026-01-12 23:05 ` Darrick J. Wong
2026-01-13 18:32 ` Andrey Albershteyn
2026-01-14 16:40 ` Darrick J. Wong [this message]
2026-01-16 14:52 ` Andrey Albershteyn
2026-01-12 14:51 ` [PATCH v2 17/22] xfs: add fs-verity ioctls Andrey Albershteyn
2026-01-12 14:52 ` [PATCH v2 18/22] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 19/22] xfs: check and repair the verity inode flag state Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 20/22] xfs: report verity failures through the health system Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 21/22] xfs: add fsverity traces Andrey Albershteyn
2026-01-12 23:07 ` Darrick J. Wong
2026-01-12 14:52 ` [PATCH v2 22/22] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
2026-01-13 16:36 ` [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Matthew Wilcox
2026-01-13 18:45 ` Andrey Albershteyn
2026-01-14 5:00 ` Matthew Wilcox
2026-01-14 6:15 ` Darrick J. Wong
2026-01-14 8:20 ` Andrey Albershteyn
2026-01-14 9:53 ` Andrey Albershteyn
2026-01-14 16:42 ` Darrick J. Wong
2026-01-19 6:33 ` fsverity metadata offset, was: " Christoph Hellwig
2026-01-19 19:32 ` Eric Biggers
2026-01-19 19:58 ` Darrick J. Wong
2026-01-20 7:32 ` Christoph Hellwig
2026-01-20 11:44 ` Andrey Albershteyn
2026-01-20 17:34 ` Darrick J. Wong
2026-01-21 15:03 ` Christoph Hellwig
2026-01-19 20:00 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260114164022.GN15583@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=aalbersh@kernel.org \
--cc=aalbersh@redhat.com \
--cc=david@fromorbit.com \
--cc=ebiggers@kernel.org \
--cc=fsverity@lists.linux.dev \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox