From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2D21221FCF; Tue, 17 Feb 2026 23:21:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771370493; cv=none; b=DDGYUyPryWXKfWG6QeOuZ9kh0UlKLq9Y9lxtANciE2lTwFJ0zFJ9eDXFCfjIXBdu40nltWjCQJeIR1+w05L7s3AMVL0cTbBOsmV4zotp2yGgnzZOrlvM9dg9nWNCSxAhgj1n5QxxPOw5t3rp/KA9Sa4ZBFjeDk45PnfjRvk96MM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771370493; c=relaxed/simple; bh=nzUUufdTtRv3cZoC6jSLmLv/+9ZhsLLk8GdFCesRfeM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UHYCIa3pA4K1Hyc65FhLlhJEpM20Td+rEEXzZY/SZrQHYaJ0xvkg8hdmGZtEtVsjnwNGJH2mcJQb8Y30AlQQZU8BhBD80vh1GsCAQs89jhGOMSirtAzONPnWq6eOuUfBoVy3cBIjJICsppwgHRg9qX4EYZeLwN65TaDQTi54V7I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MHDx0bQN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MHDx0bQN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3BB20C4CEF7; Tue, 17 Feb 2026 23:21:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771370492; bh=nzUUufdTtRv3cZoC6jSLmLv/+9ZhsLLk8GdFCesRfeM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MHDx0bQNpeErC3yjqv8XIu9YMaEEcXLh00F17eVAB0WHCzBRYLngKzTxHxmQYqrw1 OwRFAM1p9Q/LymxE72X8P9STDYlOe4lnbf3faiKwMX7DdQEP0T56ofQobzaU0fUrBm IeCTRnbhO+DUz/h1u1v58tg8g9WQkDmACcZuNm/b1Pa+GSTMYR2s7UrJLaJERKD+Wp H5Y5y+yH+Lv7JFgCXGyRd4JJOWQ+J4xbDoCidM3nB4zzf4EeWtnSNts2zNyZCYSwcM TbYr1LbpeuklsRAfg4byZ/g11fWHXw5v3DzHPs0EqFueyh0CuBKzdG7ytCXZFPTFpQ ACuPcuz4R9lVA== From: Andrey Albershteyn To: linux-xfs@vger.kernel.org, fsverity@lists.linux.dev, linux-fsdevel@vger.kernel.org, ebiggers@kernel.org Cc: Andrey Albershteyn , hch@lst.de, djwong@kernel.org Subject: [PATCH v3 28/35] xfs: add fs-verity support Date: Wed, 18 Feb 2026 00:19:28 +0100 Message-ID: <20260217231937.1183679-29-aalbersh@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260217231937.1183679-1-aalbersh@kernel.org> References: <20260217231937.1183679-1-aalbersh@kernel.org> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Add integration with fs-verity. XFS stores fs-verity descriptor and Merkle tree in the inode data fork at offset file offset (1 << 53). The Merkle tree reading/writing is done through iomap interface. The data itself are read to the inode's page cache. When XFS reads from this region iomap doesn't call into fsverity to verify it against Merkle tree. For data, verification is done on BIO completion in a workqueue. When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION flag is set meaning that the Merkle tree is being build. The initialization ends with storing of verity descriptor and setting inode on-disk flag (XFS_DIFLAG2_VERITY). The XFS_IVERITY_CONSTRUCTION is dropped and I_VERITY is set on inode. The descriptor is stored in a new block after the last Merkle tree block. The size of the descriptor is stored at the end of the last descriptor block (descriptor can be multiple blocks). Signed-off-by: Andrey Albershteyn --- fs/xfs/Makefile | 1 + fs/xfs/xfs_bmap_util.c | 8 + fs/xfs/xfs_fsverity.c | 423 ++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_fsverity.h | 2 + fs/xfs/xfs_message.c | 4 + fs/xfs/xfs_message.h | 1 + fs/xfs/xfs_mount.h | 2 + fs/xfs/xfs_super.c | 7 + 8 files changed, 447 insertions(+), 1 deletion(-) diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 5bf501cf8271..ad66439db7bf 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -147,6 +147,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o +xfs-$(CONFIG_FS_VERITY) += xfs_fsverity.o # notify failure ifeq ($(CONFIG_MEMORY_FAILURE),y) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 2208a720ec3f..3cd638e4f8b9 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -31,6 +31,7 @@ #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" #include "xfs_zone_alloc.h" +#include /* Kernel only BMAP related definitions and functions */ @@ -553,6 +554,13 @@ xfs_can_free_eofblocks( if (last_fsb <= end_fsb) return false; + /* + * Nothing to clean on fsverity inodes as they don't use prealloc and + * there no delalloc as only written data is fsverity metadata + */ + if (IS_VERITY(VFS_I(ip))) + return false; + /* * Check if there is an post-EOF extent to free. If there are any * delalloc blocks attached to the inode (data fork delalloc diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c index 4f8a40317dc3..cdbe26ab88b7 100644 --- a/fs/xfs/xfs_fsverity.c +++ b/fs/xfs/xfs_fsverity.c @@ -4,14 +4,26 @@ */ #include "xfs.h" #include "xfs_format.h" -#include "xfs_inode.h" #include "xfs_shared.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_fsverity.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_inode.h" +#include "xfs_log_format.h" +#include "xfs_bmap_util.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_trace.h" +#include "xfs_quota.h" #include "xfs_fsverity.h" +#include "xfs_iomap.h" +#include "xfs_error.h" +#include "xfs_health.h" #include #include +#include loff_t xfs_fsverity_offset_to_disk(struct xfs_inode *ip, loff_t offset) @@ -61,3 +73,412 @@ xfs_fsverity_need_convert_offset( } +static int +xfs_fsverity_read( + struct inode *inode, + void *buf, + size_t count, + loff_t pos) +{ + struct folio *folio; + size_t n; + + while (count) { + folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT, + NULL); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + n = memcpy_from_file_folio(buf, folio, pos, count); + folio_put(folio); + + buf += n; + pos += n; + count -= n; + } + return 0; +} + +static int +xfs_fsverity_write( + struct file *file, + loff_t pos, + size_t length, + const void *buf) +{ + int ret; + struct iov_iter iiter; + struct kvec kvec = { + .iov_base = (void *)buf, + .iov_len = length, + }; + struct kiocb iocb = { + .ki_filp = file, + .ki_ioprio = get_current_ioprio(), + .ki_pos = pos, + }; + + iov_iter_kvec(&iiter, WRITE, &kvec, 1, length); + + ret = iomap_file_buffered_write(&iocb, &iiter, + &xfs_buffered_write_iomap_ops, + &xfs_iomap_write_ops, NULL); + if (ret < 0) + return ret; + return 0; +} + +/* + * In cases when merkle tree block (1k) == fs block size (1k) and less than + * PAGE_SIZE (4k) we can get the following layout in the file: + * + * [ merkle block | 1k hole | 1k hole | fsverity descriptor] + * + * These holes are merkle tree blocks which are filled by iomap with hashes of + * zeroed data blocks. + * + * Anything in fsverity starts with reading a descriptor. When iomap reads this + * page for the descriptor it doesn't know how to synthesize those merkle tree + * blocks. So, those are left with random data and marked uptodate. + * + * After we're done with reading the descriptor we invalidate the page + * containing descriptor. As a descriptor for this inode is already searchable + * in the hashtable, iomap can synthesize these blocks when requested again. + */ +static int +xfs_fsverity_drop_descriptor_page( + struct inode *inode, + u64 offset) +{ + pgoff_t index = offset >> PAGE_SHIFT; + + return invalidate_inode_pages2_range(inode->i_mapping, index, index); +} + +/* + * Retrieve the verity descriptor. + */ +static int +xfs_fsverity_get_descriptor( + struct inode *inode, + void *buf, + size_t buf_size) +{ + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + __be32 d_desc_size; + u32 desc_size; + u64 desc_size_pos; + int error; + u64 desc_pos; + struct xfs_bmbt_irec rec; + int is_empty; + uint32_t blocksize = i_blocksize(VFS_I(ip)); + xfs_fileoff_t last_block_offset; + + ASSERT(inode->i_flags & S_VERITY); + error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &rec, &is_empty); + if (error) + return error; + + if (is_empty) + return -ENODATA; + + last_block_offset = + XFS_FSB_TO_B(mp, rec.br_startoff + rec.br_blockcount); + if (last_block_offset < XFS_FSVERITY_REGION_START) + return -ENODATA; + + desc_size_pos = xfs_fsverity_offset_from_disk(ip, last_block_offset) - + sizeof(__be32); + error = xfs_fsverity_read(inode, (char *)&d_desc_size, + sizeof(d_desc_size), desc_size_pos); + if (error) + return error; + + desc_size = be32_to_cpu(d_desc_size); + if (XFS_IS_CORRUPT(mp, desc_size > FS_VERITY_MAX_DESCRIPTOR_SIZE)) + return -ERANGE; + if (XFS_IS_CORRUPT(mp, desc_size > desc_size_pos)) + return -ERANGE; + + if (!buf_size) + return desc_size; + + if (XFS_IS_CORRUPT(mp, desc_size > buf_size)) + return -ERANGE; + + desc_pos = round_down(desc_size_pos - desc_size, blocksize); + error = xfs_fsverity_read(inode, buf, desc_size, desc_pos); + if (error) + return error; + + xfs_fsverity_drop_descriptor_page(inode, desc_pos); + + return desc_size; +} + +static int +xfs_fsverity_write_descriptor( + struct file *file, + const void *desc, + u32 desc_size, + u64 merkle_tree_size) +{ + int error; + struct inode *inode = file_inode(file); + struct xfs_inode *ip = XFS_I(inode); + unsigned int blksize = ip->i_mount->m_attr_geo->blksize; + u64 tree_last_block = + fsverity_metadata_offset(VFS_I(ip)) + merkle_tree_size; + u64 desc_pos = round_up(tree_last_block, blksize); + u64 desc_end = desc_pos + desc_size; + __be32 desc_size_disk = cpu_to_be32(desc_size); + u64 desc_size_pos = + round_up(desc_end + sizeof(desc_size_disk), blksize) - + sizeof(desc_size_disk); + + error = xfs_fsverity_write(file, desc_size_pos, + sizeof(__be32), + (const void *)&desc_size_disk); + if (error) + return error; + + error = xfs_fsverity_write(file, desc_pos, desc_size, desc); + + xfs_fsverity_drop_descriptor_page(inode, desc_pos); + return error; +} + +/* + * Try to remove all the fsverity metadata after a failed enablement. + */ +static int +xfs_fsverity_delete_metadata( + struct xfs_inode *ip) +{ + struct xfs_trans *tp; + struct xfs_mount *mp = ip->i_mount; + int error; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); + if (error) + return error; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, 0); + + /* + * We removing post EOF data, no need to update i_size as fsverity + * didn't move i_size in the first place + */ + error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, XFS_ISIZE(ip)); + if (error) + goto err_cancel; + + error = xfs_trans_commit(tp); + if (error) + goto err_cancel; + xfs_iunlock(ip, XFS_ILOCK_EXCL); + + return error; + +err_cancel: + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_trans_cancel(tp); + return error; +} + + +/* + * Prepare to enable fsverity by clearing old metadata. + */ +static int +xfs_fsverity_begin_enable( + struct file *filp) +{ + struct inode *inode = file_inode(filp); + struct xfs_inode *ip = XFS_I(inode); + int error; + + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL); + + if (IS_DAX(inode)) + return -EINVAL; + + if (inode->i_size > XFS_FSVERITY_REGION_START) + return -EFBIG; + + if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION)) + return -EBUSY; + + error = xfs_qm_dqattach(ip); + if (error) + return error; + + /* + * Flush pagecache before building Merkle tree. Inode is locked and no + * further writes will happen to the file except fsverity metadata + */ + error = filemap_write_and_wait(inode->i_mapping); + if (error) + return error; + + return xfs_fsverity_delete_metadata(ip); +} + +/* + * Complete (or fail) the process of enabling fsverity. + */ +static int +xfs_fsverity_end_enable( + struct file *file, + const void *desc, + size_t desc_size, + u64 merkle_tree_size) +{ + struct inode *inode = file_inode(file); + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + struct xfs_trans *tp; + int error = 0; + + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL); + + /* fs-verity failed, just cleanup */ + if (desc == NULL) + goto out; + + error = xfs_fsverity_write_descriptor(file, desc, desc_size, + merkle_tree_size); + if (error) + goto out; + + /* + * Wait for Merkle tree get written to disk before setting on-disk inode + * flag and clearing XFS_VERITY_CONSTRUCTION + */ + error = filemap_write_and_wait(inode->i_mapping); + if (error) + goto out; + + /* + * Set fsverity inode flag + */ + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange, + 0, 0, false, &tp); + if (error) + goto out; + + /* + * Ensure that we've persisted the verity information before we enable + * it on the inode and tell the caller we have sealed the inode. + */ + ip->i_diflags2 |= XFS_DIFLAG2_VERITY; + + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_trans_set_sync(tp); + + error = xfs_trans_commit(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + + if (!error) + inode->i_flags |= S_VERITY; + +out: + if (error) { + int error2; + + error2 = xfs_fsverity_delete_metadata(ip); + if (error2) + xfs_alert(ip->i_mount, +"ino 0x%llx failed to clean up new fsverity metadata, err %d", + ip->i_ino, error2); + } + + xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION); + return error; +} + +/* + * Retrieve a merkle tree block. + */ +static struct page * +xfs_fsverity_read_merkle( + struct inode *inode, + pgoff_t index) +{ + pgoff_t metadata_idx = + (fsverity_metadata_offset(inode) >> PAGE_SHIFT); + pgoff_t idx = index + metadata_idx; + + return generic_read_merkle_tree_page(inode, idx); +} + +/* + * Retrieve a merkle tree block. + */ +static void +xfs_fsverity_readahead_merkle_tree( + struct inode *inode, + pgoff_t index, + unsigned long nr_pages) +{ + pgoff_t metadata_idx = + (fsverity_metadata_offset(inode) >> PAGE_SHIFT); + pgoff_t idx = index + metadata_idx; + + generic_readahead_merkle_tree(inode, idx, nr_pages); +} + +/* + * Write a merkle tree block. + */ +static int +xfs_fsverity_write_merkle( + struct file *file, + const void *buf, + u64 pos, + unsigned int size, + const u8 *zero_digest, + unsigned int digest_size) +{ + struct inode *inode = file_inode(file); + struct xfs_inode *ip = XFS_I(inode); + loff_t position = pos + fsverity_metadata_offset(inode); + const char *p; + unsigned int i; + + if (position + size > inode->i_sb->s_maxbytes) + return -EFBIG; + + /* + * If this is a block full of hashes of zeroed blocks, don't bother + * storing the block. We can synthesize them later. + * + * However, do this only in case Merkle tree block == fs block size. + * Iomap synthesizes these blocks based on holes in the merkle tree. We + * won't be able to tell if something need to be synthesizes for the + * range in the fs block. For example, for 4k filesystem block + * + * [ 1k | zero hashes | zero hashes | 1k ] + * + * Iomap won't know about these empty blocks. + */ + for (i = 0, p = buf; i < size; i += digest_size, p += digest_size) + if (memcmp(p, zero_digest, digest_size)) + break; + if ((i == size) && (size == ip->i_mount->m_sb.sb_blocksize)) + return 0; + + return xfs_fsverity_write(file, position, size, buf); +} + +const struct fsverity_operations xfs_fsverity_ops = { + .begin_enable_verity = xfs_fsverity_begin_enable, + .end_enable_verity = xfs_fsverity_end_enable, + .get_verity_descriptor = xfs_fsverity_get_descriptor, + .read_merkle_tree_page = xfs_fsverity_read_merkle, + .readahead_merkle_tree = xfs_fsverity_readahead_merkle_tree, + .write_merkle_tree_block = xfs_fsverity_write_merkle, +}; diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h index ab01ceef4d15..91bcb86d1b3b 100644 --- a/fs/xfs/xfs_fsverity.h +++ b/fs/xfs/xfs_fsverity.h @@ -6,8 +6,10 @@ #define __XFS_FSVERITY_H__ #include "xfs.h" +#include #ifdef CONFIG_FS_VERITY +extern const struct fsverity_operations xfs_fsverity_ops; bool xfs_fsverity_sealed_data(const struct xfs_inode *ip, loff_t offset); loff_t xfs_fsverity_offset_to_disk(struct xfs_inode *ip, loff_t pos); diff --git a/fs/xfs/xfs_message.c b/fs/xfs/xfs_message.c index 19aba2c3d525..c6eb68c3e50f 100644 --- a/fs/xfs/xfs_message.c +++ b/fs/xfs/xfs_message.c @@ -161,6 +161,10 @@ xfs_warn_experimental( .opstate = XFS_OPSTATE_WARNED_ZONED, .name = "zoned RT device", }, + [XFS_EXPERIMENTAL_FSVERITY] = { + .opstate = XFS_OPSTATE_WARNED_FSVERITY, + .name = "fsverity", + }, }; ASSERT(feat >= 0 && feat < XFS_EXPERIMENTAL_MAX); BUILD_BUG_ON(ARRAY_SIZE(features) != XFS_EXPERIMENTAL_MAX); diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h index d68e72379f9d..1647d32ea4ac 100644 --- a/fs/xfs/xfs_message.h +++ b/fs/xfs/xfs_message.h @@ -96,6 +96,7 @@ enum xfs_experimental_feat { XFS_EXPERIMENTAL_LBS, XFS_EXPERIMENTAL_METADIR, XFS_EXPERIMENTAL_ZONED, + XFS_EXPERIMENTAL_FSVERITY, XFS_EXPERIMENTAL_MAX, }; diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 8ef7fea8b325..4fa9b28482ac 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -579,6 +579,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID) #define XFS_OPSTATE_WARNED_ZONED 19 /* (Zoned) GC is in progress */ #define XFS_OPSTATE_ZONEGC_RUNNING 20 +/* Kernel has logged a warning about fsverity support */ +#define XFS_OPSTATE_WARNED_FSVERITY 21 #define __XFS_IS_OPSTATE(name, NAME) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index bc71aa9dcee8..ac385955cbf4 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -30,6 +30,7 @@ #include "xfs_filestream.h" #include "xfs_quota.h" #include "xfs_sysfs.h" +#include "xfs_fsverity.h" #include "xfs_ondisk.h" #include "xfs_rmap_item.h" #include "xfs_refcount_item.h" @@ -1704,6 +1705,9 @@ xfs_fs_fill_super( sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ; #endif sb->s_op = &xfs_super_operations; +#ifdef CONFIG_FS_VERITY + sb->s_vop = &xfs_fsverity_ops; +#endif /* * Delay mount work if the debug hook is set. This is debug @@ -1961,6 +1965,9 @@ xfs_fs_fill_super( if (error) goto out_filestream_unmount; + if (xfs_has_verity(mp)) + xfs_warn_experimental(mp, XFS_EXPERIMENTAL_FSVERITY); + root = igrab(VFS_I(mp->m_rootip)); if (!root) { error = -ENOENT; -- 2.51.2