* fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree [not found] ` <6r24wj3o3gctl3vz4n3tdrfjx5ftkybdjmmye2hejdcdl6qseh@c2yvpd5d4ocf> @ 2026-01-19 6:33 ` Christoph Hellwig 2026-01-19 19:32 ` Eric Biggers 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2026-01-19 6:33 UTC (permalink / raw) To: Andrey Albershteyn Cc: Darrick J. Wong, Matthew Wilcox, fsverity, linux-xfs, ebiggers, linux-fsdevel, aalbersh, david, hch, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel While looking at fsverity I'd like to understand the choise of offset in ext4 and f2fs, and wonder about an issue. Both ext4 and f2fs round up the inode size to the next 64k boundary and place the metadata there. Both use the 65536 magic number for that instead of a well documented constant unfortunately. I assume this was picked to align up to the largest reasonable page size? Unfortunately for that: a) not all architectures are reasonable. As Darrick pointed out hexagon seems to support page size up to 1MiB. While I don't know if they exist in real life, powerpc supports up to 256kiB pages, and I know they are used for real in various embedded settings b) with large folio support in the page cache, the folios used to map files can be much larger than the base page size, with all the same issues as a larger page size So assuming that fsverity is trying to avoid the issue of a page/folio that covers both data and fsverity metadata, how does it copy with that? Do we need to disable fsverity on > 64k page size and disable large folios on fsverity files? The latter would mean writing back all cached data first as well. And going forward, should we have a v2 format that fixes this? For that we'd still need a maximum folio size of course. And of course I'd like to get all these things right from the start in XFS, while still being as similar as possible to ext4/f2fs. On Wed, Jan 14, 2026 at 10:53:00AM +0100, Andrey Albershteyn wrote: > On 2026-01-14 09:20:34, Andrey Albershteyn wrote: > > On 2026-01-13 22:15:36, Darrick J. Wong wrote: > > > On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote: > > > > On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote: > > > > > On 2026-01-13 16:36:44, Matthew Wilcox wrote: > > > > > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote: > > > > > > > The tree is read by iomap into page cache at offset 1 << 53. This is far > > > > > > > enough to handle any supported file size. > > > > > > > > > > > > What happens on 32-bit systems? (I presume you mean "offset" as > > > > > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size) > > > > > > > > > > > it's in bytes, yeah I missed 32-bit systems, I think I will try to > > > > > convert this offset to something lower on 32-bit in iomap, as > > > > > Darrick suggested. > > > > > > > > Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's > > > > MAX_LFS_FILESIZE. Are you proposing reducing that? > > > > > > > > There are some other (performance) penalties to using 1<<53 as the lowest > > > > index for metadata on 64-bit. The radix tree is going to go quite high; > > > > we use 6 bits at each level, so if you have a folio at 0 and a folio at > > > > 1<<53, you'll have a tree of height 9 and use 17 nodes. > > > > > > > > That's going to be a lot of extra cache misses when walking the XArray > > > > to find any given folio. Allowing the filesystem to decide where the > > > > metadata starts for any given file really is an important optimisation. > > > > Even if it starts at index 1<<29, you'll almost halve the number of > > > > nodes needed. > > > > Thanks for this overview! > > > > > > > > 1<<53 is only the location of the fsverity metadata in the ondisk > > > mapping. For the incore mapping, in theory we could load the fsverity > > > anywhere in the post-EOF part of the pagecache to save some bits. > > > > > > roundup(i_size_read(), 1<<folio_max_order)) would work, right? > > > > Then, there's probably no benefits to have ondisk mapping differ, > > no? > > oh, the fixed ondisk offset will help to not break if filesystem > would be mounted by machine with different page size. > > -- > - Andrey ---end quoted text--- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-19 6:33 ` fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Christoph Hellwig @ 2026-01-19 19:32 ` Eric Biggers 2026-01-19 19:58 ` Darrick J. Wong 2026-01-19 20:00 ` Matthew Wilcox 0 siblings, 2 replies; 8+ messages in thread From: Eric Biggers @ 2026-01-19 19:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrey Albershteyn, Darrick J. Wong, Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote: > While looking at fsverity I'd like to understand the choise of offset > in ext4 and f2fs, and wonder about an issue. > > Both ext4 and f2fs round up the inode size to the next 64k boundary > and place the metadata there. Both use the 65536 magic number for that > instead of a well documented constant unfortunately. > > I assume this was picked to align up to the largest reasonable page > size? Unfortunately for that: > > a) not all architectures are reasonable. As Darrick pointed out > hexagon seems to support page size up to 1MiB. While I don't know > if they exist in real life, powerpc supports up to 256kiB pages, > and I know they are used for real in various embedded settings > b) with large folio support in the page cache, the folios used to > map files can be much larger than the base page size, with all > the same issues as a larger page size > > So assuming that fsverity is trying to avoid the issue of a page/folio > that covers both data and fsverity metadata, how does it copy with that? > Do we need to disable fsverity on > 64k page size and disable large > folios on fsverity files? The latter would mean writing back all cached > data first as well. > > And going forward, should we have a v2 format that fixes this? For that > we'd still need a maximum folio size of course. And of course I'd like > to get all these things right from the start in XFS, while still being as > similar as possible to ext4/f2fs. Yes, if I recall correctly it was intended to be the "largest reasonable page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed we should disable fsverity support in that configuration. I don't think large folios are quite as problematic. ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a folio and return the appropriate page in it, and fs/verity/verify.c operates on the page. If it's a page in the folio that spans EOF, I think everything will actually still work, except userspace will be able to see Merkle tree data after a 64K boundary past EOF if the file is mmapped using huge pages. The mmap issue isn't great, but I'm not sure how much it matters, especially when the zeroes do still go up to a 64K boundary. If we do need to fix this, there are a couple things we could consider doing without changing the on-disk format in ext4 or f2fs: putting the data in the page cache at a different offset than it exists on-disk, or using "small" pages for EOF specifically. But yes, XFS should choose a larger alignment than 64K. - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-19 19:32 ` Eric Biggers @ 2026-01-19 19:58 ` Darrick J. Wong 2026-01-20 7:32 ` Christoph Hellwig 2026-01-19 20:00 ` Matthew Wilcox 1 sibling, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2026-01-19 19:58 UTC (permalink / raw) To: Eric Biggers Cc: Christoph Hellwig, Andrey Albershteyn, Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On Mon, Jan 19, 2026 at 11:32:42AM -0800, Eric Biggers wrote: > On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote: > > While looking at fsverity I'd like to understand the choise of offset > > in ext4 and f2fs, and wonder about an issue. > > > > Both ext4 and f2fs round up the inode size to the next 64k boundary > > and place the metadata there. Both use the 65536 magic number for that > > instead of a well documented constant unfortunately. > > > > I assume this was picked to align up to the largest reasonable page > > size? Unfortunately for that: > > > > a) not all architectures are reasonable. As Darrick pointed out > > hexagon seems to support page size up to 1MiB. While I don't know > > if they exist in real life, powerpc supports up to 256kiB pages, > > and I know they are used for real in various embedded settings They *did* way back in the day, I worked with some seekrit PPC440s early in my career. I don't know that any of them still exist, but the code is still there... > > b) with large folio support in the page cache, the folios used to > > map files can be much larger than the base page size, with all > > the same issues as a larger page size > > > > So assuming that fsverity is trying to avoid the issue of a page/folio > > that covers both data and fsverity metadata, how does it copy with that? > > Do we need to disable fsverity on > 64k page size and disable large > > folios on fsverity files? The latter would mean writing back all cached > > data first as well. > > > > And going forward, should we have a v2 format that fixes this? For that > > we'd still need a maximum folio size of course. And of course I'd like > > to get all these things right from the start in XFS, while still being as > > similar as possible to ext4/f2fs. > > Yes, if I recall correctly it was intended to be the "largest reasonable > page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed > we should disable fsverity support in that configuration. > > I don't think large folios are quite as problematic. > ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a > folio and return the appropriate page in it, and fs/verity/verify.c > operates on the page. If it's a page in the folio that spans EOF, I > think everything will actually still work, except userspace will be able > to see Merkle tree data after a 64K boundary past EOF if the file is > mmapped using huge pages. We don't allow mmapping file data beyond the EOF basepage, even if the underlying folio is a large folio. See generic/749, though recently Kiryl Shutsemau tried to remove that restriction[1], until dchinner and willy told him no. > The mmap issue isn't great, but I'm not sure how much it matters, > especially when the zeroes do still go up to a 64K boundary. I'm concerned that post-eof zeroing of a 256k folio could accidentally obliterate merkle tree content that was somehow previously loaded. Though afaict from the existing codebases, none of them actually make that mistake. > If we do need to fix this, there are a couple things we could consider > doing without changing the on-disk format in ext4 or f2fs: putting the > data in the page cache at a different offset than it exists on-disk, or > using "small" pages for EOF specifically. I'd leave the ondisk offset as-is, but change the pagecache offset to roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep file data and fsverity metadata completely separate. > But yes, XFS should choose a larger alignment than 64K. The roundup() formula above is what I'd choose for the pagecache offset for xfs. The ondisk offset of 1<<53 is ok with me. --D [1] https://lore.kernel.org/linux-fsdevel/20251014175214.GW6188@frogsfrogsfrogs/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-19 19:58 ` Darrick J. Wong @ 2026-01-20 7:32 ` Christoph Hellwig 2026-01-20 11:44 ` Andrey Albershteyn 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2026-01-20 7:32 UTC (permalink / raw) To: Darrick J. Wong Cc: Eric Biggers, Christoph Hellwig, Andrey Albershteyn, Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On Mon, Jan 19, 2026 at 11:58:16AM -0800, Darrick J. Wong wrote: > > > a) not all architectures are reasonable. As Darrick pointed out > > > hexagon seems to support page size up to 1MiB. While I don't know > > > if they exist in real life, powerpc supports up to 256kiB pages, > > > and I know they are used for real in various embedded settings > > They *did* way back in the day, I worked with some seekrit PPC440s early > in my career. I don't know that any of them still exist, but the code > is still there... Sorry, I meant I don't really know how real the hexagon large page sizes are. I know about the ppcs one personally, too. > > If we do need to fix this, there are a couple things we could consider > > doing without changing the on-disk format in ext4 or f2fs: putting the > > data in the page cache at a different offset than it exists on-disk, or > > using "small" pages for EOF specifically. > > I'd leave the ondisk offset as-is, but change the pagecache offset to > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep > file data and fsverity metadata completely separate. Can we find a way to do that in common code and make ext4 and f2fs do the same? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-20 7:32 ` Christoph Hellwig @ 2026-01-20 11:44 ` Andrey Albershteyn 2026-01-20 17:34 ` Darrick J. Wong 2026-01-21 15:03 ` Christoph Hellwig 0 siblings, 2 replies; 8+ messages in thread From: Andrey Albershteyn @ 2026-01-20 11:44 UTC (permalink / raw) To: Christoph Hellwig Cc: Darrick J. Wong, Eric Biggers, Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On 2026-01-20 08:32:18, Christoph Hellwig wrote: > On Mon, Jan 19, 2026 at 11:58:16AM -0800, Darrick J. Wong wrote: > > > > a) not all architectures are reasonable. As Darrick pointed out > > > > hexagon seems to support page size up to 1MiB. While I don't know > > > > if they exist in real life, powerpc supports up to 256kiB pages, > > > > and I know they are used for real in various embedded settings > > > > They *did* way back in the day, I worked with some seekrit PPC440s early > > in my career. I don't know that any of them still exist, but the code > > is still there... > > Sorry, I meant I don't really know how real the hexagon large page > sizes are. I know about the ppcs one personally, too. > > > > If we do need to fix this, there are a couple things we could consider > > > doing without changing the on-disk format in ext4 or f2fs: putting the > > > data in the page cache at a different offset than it exists on-disk, or > > > using "small" pages for EOF specifically. > > > > I'd leave the ondisk offset as-is, but change the pagecache offset to > > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep > > file data and fsverity metadata completely separate. > > Can we find a way to do that in common code and make ext4 and f2fs do > the same? hmm I don't see what else we could do except providing common offset and then use it to map blocks loff_t fsverity_metadata_offset(struct inode *inode) { return roundup(i_size_read(), mapping_max_folio_size_supported()); } -- - Andrey ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-20 11:44 ` Andrey Albershteyn @ 2026-01-20 17:34 ` Darrick J. Wong 2026-01-21 15:03 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Darrick J. Wong @ 2026-01-20 17:34 UTC (permalink / raw) To: Andrey Albershteyn Cc: Christoph Hellwig, Eric Biggers, Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On Tue, Jan 20, 2026 at 12:44:19PM +0100, Andrey Albershteyn wrote: > On 2026-01-20 08:32:18, Christoph Hellwig wrote: > > On Mon, Jan 19, 2026 at 11:58:16AM -0800, Darrick J. Wong wrote: > > > > > a) not all architectures are reasonable. As Darrick pointed out > > > > > hexagon seems to support page size up to 1MiB. While I don't know > > > > > if they exist in real life, powerpc supports up to 256kiB pages, > > > > > and I know they are used for real in various embedded settings > > > > > > They *did* way back in the day, I worked with some seekrit PPC440s early > > > in my career. I don't know that any of them still exist, but the code > > > is still there... > > > > Sorry, I meant I don't really know how real the hexagon large page > > sizes are. I know about the ppcs one personally, too. > > > > > > If we do need to fix this, there are a couple things we could consider > > > > doing without changing the on-disk format in ext4 or f2fs: putting the > > > > data in the page cache at a different offset than it exists on-disk, or > > > > using "small" pages for EOF specifically. > > > > > > I'd leave the ondisk offset as-is, but change the pagecache offset to > > > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep > > > file data and fsverity metadata completely separate. > > > > Can we find a way to do that in common code and make ext4 and f2fs do > > the same? > > hmm I don't see what else we could do except providing common offset > and then use it to map blocks > > loff_t fsverity_metadata_offset(struct inode *inode) > { > return roundup(i_size_read(), mapping_max_folio_size_supported()); > } Yeah, that's probably the best we can do. Please add a comment to that helper to state explicitly that this is the *incore* file offset of the merkle tree if the filesystem decides to cache it in the pagecache. --D > -- > - Andrey > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-20 11:44 ` Andrey Albershteyn 2026-01-20 17:34 ` Darrick J. Wong @ 2026-01-21 15:03 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2026-01-21 15:03 UTC (permalink / raw) To: Andrey Albershteyn Cc: Christoph Hellwig, Darrick J. Wong, Eric Biggers, Matthew Wilcox, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On Tue, Jan 20, 2026 at 12:44:19PM +0100, Andrey Albershteyn wrote: > > > I'd leave the ondisk offset as-is, but change the pagecache offset to > > > roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep > > > file data and fsverity metadata completely separate. > > > > Can we find a way to do that in common code and make ext4 and f2fs do > > the same? > > hmm I don't see what else we could do except providing common offset > and then use it to map blocks > > loff_t fsverity_metadata_offset(struct inode *inode) > { > return roundup(i_size_read(), mapping_max_folio_size_supported()); > } Something like that, yes. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree 2026-01-19 19:32 ` Eric Biggers 2026-01-19 19:58 ` Darrick J. Wong @ 2026-01-19 20:00 ` Matthew Wilcox 1 sibling, 0 replies; 8+ messages in thread From: Matthew Wilcox @ 2026-01-19 20:00 UTC (permalink / raw) To: Eric Biggers Cc: Christoph Hellwig, Andrey Albershteyn, Darrick J. Wong, fsverity, linux-xfs, linux-fsdevel, aalbersh, david, tytso, linux-ext4, jaegeuk, chao, linux-f2fs-devel On Mon, Jan 19, 2026 at 11:32:42AM -0800, Eric Biggers wrote: > On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote: > > While looking at fsverity I'd like to understand the choise of offset > > in ext4 and f2fs, and wonder about an issue. > > > > Both ext4 and f2fs round up the inode size to the next 64k boundary > > and place the metadata there. Both use the 65536 magic number for that > > instead of a well documented constant unfortunately. > > > > I assume this was picked to align up to the largest reasonable page > > size? Unfortunately for that: > > > > a) not all architectures are reasonable. As Darrick pointed out > > hexagon seems to support page size up to 1MiB. While I don't know > > if they exist in real life, powerpc supports up to 256kiB pages, > > and I know they are used for real in various embedded settings > > b) with large folio support in the page cache, the folios used to > > map files can be much larger than the base page size, with all > > the same issues as a larger page size > > > > So assuming that fsverity is trying to avoid the issue of a page/folio > > that covers both data and fsverity metadata, how does it copy with that? > > Do we need to disable fsverity on > 64k page size and disable large > > folios on fsverity files? The latter would mean writing back all cached > > data first as well. > > > > And going forward, should we have a v2 format that fixes this? For that > > we'd still need a maximum folio size of course. And of course I'd like > > to get all these things right from the start in XFS, while still being as > > similar as possible to ext4/f2fs. > > Yes, if I recall correctly it was intended to be the "largest reasonable > page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed > we should disable fsverity support in that configuration. I don't think anybody will weep for lack of fsverity support in these weirdo large PAGE_SIZE configurations. > I don't think large folios are quite as problematic. > ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a > folio and return the appropriate page in it, and fs/verity/verify.c > operates on the page. If it's a page in the folio that spans EOF, I > think everything will actually still work, except userspace will be able > to see Merkle tree data after a 64K boundary past EOF if the file is > mmapped using huge pages. > > The mmap issue isn't great, but I'm not sure how much it matters, > especially when the zeroes do still go up to a 64K boundary. We actually refuse to map pages after EOF. See filemap_map_pages() if ((file_end >= folio_next_index(folio) || shmem_mapping(mapping)) && filemap_map_pmd(vmf, folio, start_pgoff)) { ret = VM_FAULT_NOPAGE; goto out; } along with the other treatment of end_pgoff. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-21 15:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1768229271.patch-series@thinky>
[not found] ` <aWZ0nJNVTnyuFTmM@casper.infradead.org>
[not found] ` <op5poqkjoachiv2qfwizunoeg7h6w5x2rxdvbs4vhryr3aywbt@cul2yevayijl>
[not found] ` <aWci_1Uu5XndYNkG@casper.infradead.org>
[not found] ` <20260114061536.GG15551@frogsfrogsfrogs>
[not found] ` <5z5r6jizgxqz5axvzwbdmtkadehgdf7semqy2oxsfytmzzu6ik@zfvhexcp3fz2>
[not found] ` <6r24wj3o3gctl3vz4n3tdrfjx5ftkybdjmmye2hejdcdl6qseh@c2yvpd5d4ocf>
2026-01-19 6:33 ` fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Christoph Hellwig
2026-01-19 19:32 ` Eric Biggers
2026-01-19 19:58 ` Darrick J. Wong
2026-01-20 7:32 ` Christoph Hellwig
2026-01-20 11:44 ` Andrey Albershteyn
2026-01-20 17:34 ` Darrick J. Wong
2026-01-21 15:03 ` Christoph Hellwig
2026-01-19 20:00 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox