From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52CE3C4332F for ; Tue, 13 Dec 2022 21:40:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236408AbiLMVkx (ORCPT ); Tue, 13 Dec 2022 16:40:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236318AbiLMVkw (ORCPT ); Tue, 13 Dec 2022 16:40:52 -0500 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF71A1A22B for ; Tue, 13 Dec 2022 13:40:51 -0800 (PST) Received: by mail-pj1-x1033.google.com with SMTP id n65-20020a17090a2cc700b0021bc5ef7a14so5028246pjd.0 for ; Tue, 13 Dec 2022 13:40:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WWOreJPQx5jt0V0oltby0KYxV6q8nWo5V1Bt/xC127M=; b=l3oX1rtw3raFFroaEmbpolXxzpo6JcgJ+bqOf4pLeQUW6ghXrUi6UOP3xpqruOc1jB bxr8dPt5qlTyWJ65qDnuHC3ULzcGuFq6yfcASRBD9z1O9NH4HCKvXJNQOiBbr28YysAm vj8Hh54+a3/bpvZWE2zJm/y0agfbWTUPz280CRQJvLeIzmJDnbyydP7pFOI7Z4yzDlkB F02q53UrxzdNkqr+PggEZFsLTnq3WwKrcnnGgsg+GPxxb5JIVSIMuFe0VXR/6N+2CKMv Q+GTi8P9lBdE8FXXCqNeiZdo1N4itQwZo4fu4dxCQXnc6webZXclbTf2bbTUl2cfwrPm Z5GA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WWOreJPQx5jt0V0oltby0KYxV6q8nWo5V1Bt/xC127M=; b=6gKNtS3+uK094QSMMlrlBmc7YntjdPLt7mo5y+iBKAA6/KMbfzckzIrlHdI1ld3zEW r5wKlbAV+hmJGzpqtQEbaMvokik64sGioWEOgzqOkBs7Kblf767hLVg4HK72djGzMuFu x2lIxSMtlO8roV59flmy2iNfPtzBqy43LOo/2YiuoG4BbL6p4njQSfJMb2yuCUKrlwfz 1z78XjN243oJ/ba6tJgkccPQk/6fmplOhRUTTBXno7jbV3CNc45SpEJo+EU8Jss3hb4r zosAo6XF+EQZHyA04+hBMfE54eAeeFFijgncIlmTOZvp0QqYMceAWV8U3tUBTvrOfvgV +NtA== X-Gm-Message-State: ANoB5pkPwdcRQDxy4j3EEBvfgiOGZVPiU65SgpRJYMNi5DcYKkiJx11+ m9nsIexmmeLDGyRPa9oIGvq9Aw== X-Google-Smtp-Source: AA0mqf7nRdA9TjhwgmnelX9M4i3O367M3cDxRaiadXWXydAbqGuTYmSxzPwDrYF+IQtJE9Lx+ax+QQ== X-Received: by 2002:a17:902:f2ca:b0:189:ac49:fe9d with SMTP id h10-20020a170902f2ca00b00189ac49fe9dmr20556259plc.19.1670967651169; Tue, 13 Dec 2022 13:40:51 -0800 (PST) Received: from dread.disaster.area (pa49-181-138-158.pa.nsw.optusnet.com.au. [49.181.138.158]) by smtp.gmail.com with ESMTPSA id q13-20020a170902f34d00b001869581f7ecsm362068ple.116.2022.12.13.13.40.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Dec 2022 13:40:50 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1p5D15-00867t-Rw; Wed, 14 Dec 2022 08:40:47 +1100 Date: Wed, 14 Dec 2022 08:40:47 +1100 From: Dave Chinner To: Eric Biggers Cc: Andrey Albershteyn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [RFC PATCH 10/11] xfs: add fs-verity support Message-ID: <20221213214047.GY3600936@dread.disaster.area> References: <20221213172935.680971-1-aalbersh@redhat.com> <20221213172935.680971-11-aalbersh@redhat.com> <20221213203319.GV3600936@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Dec 13, 2022 at 12:39:39PM -0800, Eric Biggers wrote: > On Wed, Dec 14, 2022 at 07:33:19AM +1100, Dave Chinner wrote: > > On Tue, Dec 13, 2022 at 11:08:45AM -0800, Eric Biggers wrote: > > > On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote: > > > > > > > > Also add check that block size == PAGE_SIZE as fs-verity doesn't > > > > support different sizes yet. > > > > > > That's coming with > > > https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u, > > > which I'll be resending soon and I hope to apply for 6.3. > > > Review and testing of that patchset, along with its associated xfstests update > > > (https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u), > > > would be greatly appreciated. > > > > > > Note, as proposed there will still be a limit of: > > > > > > merkle_tree_block_size <= fs_block_size <= page_size > > > > > Hopefully you don't need fs_block_size > page_size or > > > > Yes, we will. > > > > This back on my radar now that folios have settled down. It's > > pretty trivial for XFS to do because we already support metadata > > block sizes > filesystem block size. Here is an old prototype: > > > > https://lore.kernel.org/linux-xfs/20181107063127.3902-1-david@fromorbit.com/ > > As per my follow-up response > (https://lore.kernel.org/r/Y5jc7P1ZeWHiTKRF@sol.localdomain), > I now think that wouldn't actually be a problem. Good to hear. > > > merkle_tree_block_size > fs_block_size? > > > > That's also a desirable addition. > > > > XFS is using xattrs to hold merkle tree blocks so the merkle tree > > storage is are already independent of the filesystem block size and > > page cache limitations. Being able to using 64kB merkle tree blocks > > would be really handy for reducing the search depth and overall IO > > footprint of really large files. > > Well, the main problem is that using a Merkle tree block of 64K would mean that > you can never read less than 64K at a time. Sure, but why does that matter? The typical cost of a 64kB IO is only about 5% more than a 4kB IO, even on slow spinning storage. However, we bring an order of magnitude more data into the cache with that IO, so we can then process more data before we have to go to disk again and take another latency hit. FYI, we have this large 64kB block size option for directories in XFS already - you can have a 4kB block size filesystem with a 64kB directory block size. The larger block size is a little slower for small directories because they have higher per-leaf block CPU processing overhead, but once you get to millions of records in a single directory or really high sustained IO load, the larger block size is *much* faster because the reduction in IO latency and search efficiency more than makes up for the single block CPU processing overhead... The merkle tree is little different - once we get into TB scale files, the merkle tree is indexing millions of individual records. At this point overall record lookup and IO efficiency dominates the data access time, not the amount of data each individual IO retreives from disk. Keep in mind that the block size used for the merkle tree would be a filesystem choice. If we have the capability to support 64kB merkle tree blocks, then XFS can make the choice of what block size to use at the point where we are measuring the file because we know how large the file is at that point. And because we're storing the merkle tree blocks in xattrs, we know exactly what block size the merkle tree data was stored in from the xattr metadata... Cheers, Dave. -- Dave Chinner david@fromorbit.com