From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08996130A72; Wed, 8 May 2024 20:14:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715199271; cv=none; b=LWLT8cWYQA6biUuoQlHM2T2YhbRYPUkvDZLXyravQN2nDMwVzYLWhyVSVsqsIumdf6YjXGx62wJti+LRY8qezZUd8zLkBz5LU8xyTaBxM+WpkAQ8WoRsjTFdTPmWaZvIK2vRprPyY4AbZADvQWstwzr/a7hCTS2CXqqqN0jrTnE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715199271; c=relaxed/simple; bh=L+Ol7kfXBT3RUCBAFLA55mOGHL2uIz7YXQL6uRrTZ8g=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RvbISo9HNw6R6MZMIwkSa3dyx7j22FiD8CdjTihO9fcYiJPTmoeyPSNQ63YMPWg7zJP+q85VnxuHVo4XXLhmeTviXp/9f6K3BnijOqocS4LaQBZyA8/7QUBj4UwtujGEvXlkVzMVYXNqAwSLx5evmAJBIUopg4wZ3YPpUqgdlTY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JNAjxyIa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JNAjxyIa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F271C113CC; Wed, 8 May 2024 20:14:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715199270; bh=L+Ol7kfXBT3RUCBAFLA55mOGHL2uIz7YXQL6uRrTZ8g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JNAjxyIaIJBuJ8jnihB5qf7eOP0ac85N/NkFMu6sVvLZC0Ku4Uvw+NxkIAttwc7Mq tfSACLAL+k6sd51tr4s+3VT1U320/Zw8mi6RkhibBSO16z2IqBatiAmqPoMSeHePwW r9VdPv9eKNyUYSvSGZMycTDVbgcrKNTZGWuqkYrT3hUpoNEPirPJhxJyULQcUsvgkJ WkAhBOnH345f7IkyvFRDOhofFgv/iYfT0RLhAUzNFzsTwVv3+M59kygyxDEbylmFnX Tg6HDjbU108s+N7VYezmlJj8w1N/LidNqeB+CxMokQk0fKXNsN/SOFf9ymoANVtJQN kJZflbm8C4YZQ== Date: Wed, 8 May 2024 13:14:29 -0700 From: "Darrick J. Wong" To: Eric Biggers Cc: Christoph Hellwig , aalbersh@redhat.com, linux-xfs@vger.kernel.org, alexl@redhat.com, walters@verbum.org, fsverity@lists.linux.dev, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets Message-ID: <20240508201429.GB360919@frogsfrogsfrogs> References: <171444679542.955480.18087310571597618350.stgit@frogsfrogsfrogs> <171444679642.955480.14668034329027994356.stgit@frogsfrogsfrogs> <20240501223303.GF360919@frogsfrogsfrogs> <20240502004231.GC1853833@google.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240502004231.GC1853833@google.com> On Thu, May 02, 2024 at 12:42:31AM +0000, Eric Biggers wrote: > On Wed, May 01, 2024 at 03:33:03PM -0700, Darrick J. Wong wrote: > > On Wed, May 01, 2024 at 12:33:14AM -0700, Christoph Hellwig wrote: > > > > + const u64 end_pos = min(pos + length, vi->tree_params.tree_size); > > > > + struct backing_dev_info *bdi = inode->i_sb->s_bdi; > > > > + const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT, > > > > + ULONG_MAX); > > > > + const struct merkle_tree_params *params = &vi->tree_params; > > > > > > bdi->io_pages is really a VM readahead concept. I know this is existing > > > code, but can we rething why this is even used here? > > > > I would get rid of it entirely for the merkle-by-block case, since we'd > > have to walk the xattr tree again just to find the next block. XFS > > ignores the readahead value entirely. > > > > I think this only makes sense for the merkle-by-page case, and only > > because ext4 and friends are stuffing the merkle data in the posteof > > parts of the file mapping. > > > > And even then, shouldn't we figure out the amount of readahead going on > > and only ask for enough readahead of the merkle tree to satisfy that > > readahead? > > The existing code is: > > unsigned long num_ra_pages = > min_t(unsigned long, last_index - index + 1, > inode->i_sb->s_bdi->io_pages); > > So it does limit the readahead amount to the amount remaining to be read. > > In addition, it's limited to io_pages. It's possible that's not the best value > to use (maybe it should be ra_pages?), but the intent was to just use a large > readahead size, since this code is doing a fully sequential read. io_pages is supposed to be the optimal IO size, whereas ra_pages is the readahead size for the block device. I don't know why you chose io_pages, but I'm assuming there's a reason there. :) Somewhat confusingly, I think mm/readahead.c picks the maximum of io_pages and ra_pages, which doesn't clear things up for me either. Personally I think fsverity should be using ra_pages here, but changing it should be a different patch with a separate justification. This patch simply has to translate the merkle-by-page code to handle by-block. > I do think that the concept of Merkle tree readahead makes sense regardless of > how the blocks are being stored. Having to go to disk every time a new 4K > Merkle tree block is needed increases read latencies. It doesn't need to be > included in the initial implementation though. Of course, if we're really ok with xfs making a giant left turn and storing the entire merkle tree as one big chunk of file range in the attr fork, then suddenly it *does* make sense to allow merkle tree readahead again. > > > And the returned/passed value should be a kernel pointer to the start > > > of the in-memory copy of the block? > > > to > > > > This particular callsite is reading merkle data on behalf of an > > ioctl that exports data. Maybe we want the filesystem's errors to be > > bounced up to userspace? > > Yes, I think so. Ok, thanks for confirming that. > > > > +static bool is_hash_block_verified(struct inode *inode, > > > > + struct fsverity_blockbuf *block, > > > > unsigned long hblock_idx) > > > > > > Other fsverify code seems to use the (IMHO) much more readable > > > two-tab indentation for prototype continuations, maybe stick to that? > > > > I'll do that, if Eric says so. :) > > My preference is to align continuations with the line that they're continuing: > > static bool is_hash_block_verified(struct inode *inode, > struct fsverity_blockbuf *block, > unsigned long hblock_idx) > > > > > > > > > { > > > > + struct fsverity_info *vi = inode->i_verity_info; > > > > + struct page *hpage = (struct page *)block->context; > > > > > > block->context is a void pointer, no need for casting it. > > > > Eric insisted on it: > > https://lore.kernel.org/linux-xfs/20240306035622.GA68962@sol.localdomain/ > > No, I didn't. It showed up in some code snippets that I suggested, but the > casts originated from the patch itself. Leaving out the cast is fine with me. Oh ok. I'll drop those then. > > > > > > + for (; level > 0; level--) > > > > + fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block); > > > > > > Overlh long line here. But the loop kinda looks odd anyway with the > > > exta one off in the body instead of the loop. > > > > I /think/ that's a side effect of reusing the value of @level after the > > first loop fails as the initial conditions of the unwind loop. AFAICT > > it doesn't leak, but it's not entirely straightforward. > > When an error occurs either ascending or descending the tree, we end up here > with 'level' containing the number of levels that need to be cleaned up. It > might be clearer if it was called 'num_levels', though that could be confused > with 'params->num_levels'. Or we could use: 'while (level-- > 0)'. > > This is unrelated to this patch though. --D > - Eric >