From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93A4A26C3A2; Mon, 19 Jan 2026 06:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.95.11.211 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768804441; cv=none; b=rcQLr/Rrg5Ne3IhXFsMUykuriXbryP+XxDMqjPBYjtAwW02zXwTZwCG67cBlmj4zKSbnT8kpaVGw30bMPFJ/zc0M1wBQGG4wfrUm7PkE8NjhmqeBoXvRO8HvtjqsdqI1LDFBMhMUYT/f4G1rXmqNzVUGQQ1X99C+sx06o/rRdvc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768804441; c=relaxed/simple; bh=tiasUID9V3LyXmNQ/3VFu+opuMBWShf55UGKactveAY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=tKVqv/UbVYJpRKJC/XELu68MmQ4FAzk2xCmqzQl8FG1FeqgMP16yJedJ8TLCAWhLLYJo81+eyo5ULsxuuBP2K3YTmVCznfVHSSyxTRSoZrw+wahf52I1DbbcUCJuviTNYagjaHwoZRNsrPkjyv/sauW5WdNFI4ZE7eTGaBGgcbI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lst.de; spf=pass smtp.mailfrom=lst.de; arc=none smtp.client-ip=213.95.11.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lst.de Received: by verein.lst.de (Postfix, from userid 2407) id B8DD9227A88; Mon, 19 Jan 2026 07:33:49 +0100 (CET) Date: Mon, 19 Jan 2026 07:33:49 +0100 From: Christoph Hellwig To: Andrey Albershteyn Cc: "Darrick J. Wong" , Matthew Wilcox , fsverity@lists.linux.dev, linux-xfs@vger.kernel.org, ebiggers@kernel.org, linux-fsdevel@vger.kernel.org, aalbersh@kernel.org, david@fromorbit.com, hch@lst.de, tytso@mit.edu, linux-ext4@vger.kernel.org, jaegeuk@kernel.org, chao@kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity support for XFS with post EOF merkle tree Message-ID: <20260119063349.GA643@lst.de> References: <20260114061536.GG15551@frogsfrogsfrogs> <5z5r6jizgxqz5axvzwbdmtkadehgdf7semqy2oxsfytmzzu6ik@zfvhexcp3fz2> <6r24wj3o3gctl3vz4n3tdrfjx5ftkybdjmmye2hejdcdl6qseh@c2yvpd5d4ocf> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6r24wj3o3gctl3vz4n3tdrfjx5ftkybdjmmye2hejdcdl6qseh@c2yvpd5d4ocf> User-Agent: Mutt/1.5.17 (2007-11-01) While looking at fsverity I'd like to understand the choise of offset in ext4 and f2fs, and wonder about an issue. Both ext4 and f2fs round up the inode size to the next 64k boundary and place the metadata there. Both use the 65536 magic number for that instead of a well documented constant unfortunately. I assume this was picked to align up to the largest reasonable page size? Unfortunately for that: a) not all architectures are reasonable. As Darrick pointed out hexagon seems to support page size up to 1MiB. While I don't know if they exist in real life, powerpc supports up to 256kiB pages, and I know they are used for real in various embedded settings b) with large folio support in the page cache, the folios used to map files can be much larger than the base page size, with all the same issues as a larger page size So assuming that fsverity is trying to avoid the issue of a page/folio that covers both data and fsverity metadata, how does it copy with that? Do we need to disable fsverity on > 64k page size and disable large folios on fsverity files? The latter would mean writing back all cached data first as well. And going forward, should we have a v2 format that fixes this? For that we'd still need a maximum folio size of course. And of course I'd like to get all these things right from the start in XFS, while still being as similar as possible to ext4/f2fs. On Wed, Jan 14, 2026 at 10:53:00AM +0100, Andrey Albershteyn wrote: > On 2026-01-14 09:20:34, Andrey Albershteyn wrote: > > On 2026-01-13 22:15:36, Darrick J. Wong wrote: > > > On Wed, Jan 14, 2026 at 05:00:47AM +0000, Matthew Wilcox wrote: > > > > On Tue, Jan 13, 2026 at 07:45:47PM +0100, Andrey Albershteyn wrote: > > > > > On 2026-01-13 16:36:44, Matthew Wilcox wrote: > > > > > > On Mon, Jan 12, 2026 at 03:49:44PM +0100, Andrey Albershteyn wrote: > > > > > > > The tree is read by iomap into page cache at offset 1 << 53. This is far > > > > > > > enough to handle any supported file size. > > > > > > > > > > > > What happens on 32-bit systems? (I presume you mean "offset" as > > > > > > "index", so this is 1 << 65 bytes on machines with a 4KiB page size) > > > > > > > > > > > it's in bytes, yeah I missed 32-bit systems, I think I will try to > > > > > convert this offset to something lower on 32-bit in iomap, as > > > > > Darrick suggested. > > > > > > > > Hm, we use all 32 bits of folio->index on 32-bit plaftorms. That's > > > > MAX_LFS_FILESIZE. Are you proposing reducing that? > > > > > > > > There are some other (performance) penalties to using 1<<53 as the lowest > > > > index for metadata on 64-bit. The radix tree is going to go quite high; > > > > we use 6 bits at each level, so if you have a folio at 0 and a folio at > > > > 1<<53, you'll have a tree of height 9 and use 17 nodes. > > > > > > > > That's going to be a lot of extra cache misses when walking the XArray > > > > to find any given folio. Allowing the filesystem to decide where the > > > > metadata starts for any given file really is an important optimisation. > > > > Even if it starts at index 1<<29, you'll almost halve the number of > > > > nodes needed. > > > > Thanks for this overview! > > > > > > > > 1<<53 is only the location of the fsverity metadata in the ondisk > > > mapping. For the incore mapping, in theory we could load the fsverity > > > anywhere in the post-EOF part of the pagecache to save some bits. > > > > > > roundup(i_size_read(), 1< > > > Then, there's probably no benefits to have ondisk mapping differ, > > no? > > oh, the fixed ondisk offset will help to not break if filesystem > would be mounted by machine with different page size. > > -- > - Andrey ---end quoted text---