From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrey Albershteyn <aalbersh@redhat.com>,
fsverity@lists.linux.dev, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, chandan.babu@oracle.com,
ebiggers@kernel.org
Subject: Re: [PATCH v5 11/24] xfs: add XBF_VERITY_SEEN xfs_buf flag
Date: Thu, 7 Mar 2024 19:31:38 -0800 [thread overview]
Message-ID: <20240308033138.GN6184@frogsfrogsfrogs> (raw)
In-Reply-To: <ZepxHObVLb3JLCl/@dread.disaster.area>
On Fri, Mar 08, 2024 at 12:59:56PM +1100, Dave Chinner wrote:
> On Thu, Mar 07, 2024 at 02:46:54PM -0800, Darrick J. Wong wrote:
> > On Mon, Mar 04, 2024 at 08:10:34PM +0100, Andrey Albershteyn wrote:
> > > One of essential ideas of fs-verity is that pages which are already
> > > verified won't need to be re-verified if they still in page cache.
> > >
> > > XFS will store Merkle tree blocks in extended file attributes. When
> > > read extended attribute data is put into xfs_buf.
> > >
> > > fs-verity uses PG_checked flag to track status of the blocks in the
> > > page. This flag can has two meanings - page was re-instantiated and
> > > the only block in the page is verified.
> > >
> > > However, in XFS, the data in the buffer is not aligned with xfs_buf
> > > pages and we don't have a reference to these pages. Moreover, these
> > > pages are released when value is copied out in xfs_attr code. In
> > > other words, we can not directly mark underlying xfs_buf's pages as
> > > verified as it's done by fs-verity for other filesystems.
> > >
> > > One way to track that these pages were processed by fs-verity is to
> > > mark buffer as verified instead. If buffer is evicted the incore
> > > XBF_VERITY_SEEN flag is lost. When the xattr is read again
> > > xfs_attr_get() returns new buffer without the flag. The xfs_buf's
> > > flag is then used to tell fs-verity this buffer was cached or not.
> > >
> > > The second state indicated by PG_checked is if the only block in the
> > > PAGE is verified. This is not the case for XFS as there could be
> > > multiple blocks in single buffer (page size 64k block size 4k). This
> > > is handled by fs-verity bitmap. fs-verity is always uses bitmap for
> > > XFS despite of Merkle tree block size.
> > >
> > > The meaning of the flag is that value of the extended attribute in
> > > the buffer is processed by fs-verity.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > ---
> > > fs/xfs/xfs_buf.h | 18 ++++++++++--------
> > > 1 file changed, 10 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> > > index 73249abca968..2a73918193ba 100644
> > > --- a/fs/xfs/xfs_buf.h
> > > +++ b/fs/xfs/xfs_buf.h
> > > @@ -24,14 +24,15 @@ struct xfs_buf;
> > >
> > > #define XFS_BUF_DADDR_NULL ((xfs_daddr_t) (-1LL))
> > >
> > > -#define XBF_READ (1u << 0) /* buffer intended for reading from device */
> > > -#define XBF_WRITE (1u << 1) /* buffer intended for writing to device */
> > > -#define XBF_READ_AHEAD (1u << 2) /* asynchronous read-ahead */
> > > -#define XBF_NO_IOACCT (1u << 3) /* bypass I/O accounting (non-LRU bufs) */
> > > -#define XBF_ASYNC (1u << 4) /* initiator will not wait for completion */
> > > -#define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
> > > -#define XBF_STALE (1u << 6) /* buffer has been staled, do not find it */
> > > -#define XBF_WRITE_FAIL (1u << 7) /* async writes have failed on this buffer */
> > > +#define XBF_READ (1u << 0) /* buffer intended for reading from device */
> > > +#define XBF_WRITE (1u << 1) /* buffer intended for writing to device */
> > > +#define XBF_READ_AHEAD (1u << 2) /* asynchronous read-ahead */
> > > +#define XBF_NO_IOACCT (1u << 3) /* bypass I/O accounting (non-LRU bufs) */
> > > +#define XBF_ASYNC (1u << 4) /* initiator will not wait for completion */
> > > +#define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
> > > +#define XBF_STALE (1u << 6) /* buffer has been staled, do not find it */
> > > +#define XBF_WRITE_FAIL (1u << 7) /* async writes have failed on this buffer */
> > > +#define XBF_VERITY_SEEN (1u << 8) /* buffer was processed by fs-verity */
> >
> > Yuck. I still dislike this entire approach.
> >
> > XBF_DOUBLE_ALLOC doubles the memory consumption of any xattr block that
> > gets loaded on behalf of a merkle tree request, then uses the extra
> > space to shadow the contents of the ondisk block. AFAICT the shadow
> > doesn't get updated even if the cached data does, which sounds like a
> > landmine for coherency issues.
> >
> > XFS_DA_OP_BUFFER is a little gross, since I don't like the idea of
> > exposing the low level buffering details of the xattr code to
> > xfs_attr_get callers.
> >
> > XBF_VERITY_SEEN is a layering violation because now the overall buffer
> > cache can track file metadata state. I think the reason why you need
> > this state flag is because the datadev buffer target indexes based on
> > physical xfs_daddr_t, whereas merkle tree blocks have their own internal
> > block numbers. You can't directly go from the merkle block number to an
> > xfs_daddr_t, so you can't use xfs_buf_incore to figure out if the block
> > fell out of memory.
> >
> > ISTR asking for a separation of these indices when I reviewed some
> > previous version of this patchset. At the time it seems to me that a
> > much more efficient way to cache the merkle tree blocks would be to set
> > up a direct (merkle tree block number) -> (blob of data) lookup table.
> > That I don't see here.
> >
> > In the spirit of the recent collaboration style that I've taken with
> > Christoph, I pulled your branch and started appending patches to it to
> > see if the design that I'm asking for is reasonable. As it so happens,
> > I was working on a simplified version of the xfs buffer cache ("fsbuf")
> > that could be used by simple filesystems to get them off of buffer
> > heads.
> >
> > (Ab)using the fsbuf code did indeed work (and passed all the fstests -g
> > verity tests), so now I know the idea is reasonable. Patches 11, 12,
> > 14, and 15 become unnecessary. However, this solution is itself grossly
> > overengineered, since all we want are the following operations:
> >
> > peek(key): returns an fsbuf if there's any data cached for key
> >
> > get(key): returns an fsbuf for key, regardless of state
> >
> > store(fsbuf, p): attach a memory buffer p to fsbuf
> >
> > Then the xfs ->read_merkle_tree_block function becomes:
> >
> > bp = peek(key)
> > if (bp)
> > /* return bp data up to verity */
> >
> > p = xfs_attr_get(key)
> > if (!p)
> > /* error */
> >
> > bp = get(key)
> > store(bp, p)
>
> Ok, that looks good - it definitely gets rid of a lot of the
> nastiness, but I have to ask: why does it need to be based on
> xfs_bufs?
(copying from IRC) It was still warm in my brain L2 after all the xfile
buftarg cleaning and merging that just got done a few weeks ago. So I
went with the simplest thing I could rig up to test my ideas, and now
we're at the madly iterate until exhaustion stage. ;)
> That's just wasting 300 bytes of memory on a handle to
> store a key and a opaque blob in a rhashtable.
Yep. The fsbufs implementation was a lot more slender, but a bunch more
code. I agree that I ought to go look at xarrays or something that's
more of a direct mapping as a next step. However, i wanted to get
Andrey's feedback on this general approach first.
> IIUC, the key here is a sequential index, so an xarray would be a
> much better choice as it doesn't require internal storage of the
> key.
I wonder, what are the access patterns for merkle blobs? Is it actually
sequential, or is more like 0 -> N -> N*N as we walk towards leaves?
Also -- the fsverity block interfaces pass in a "u64 pos" argument. Was
that done because merkle trees may some day have more than 2^32 blocks
in them? That won't play well with things like xarrays on 32-bit
machines.
(Granted we've been talking about deprecating XFS on 32-bit for a while
now but we're not the whole world)
> i.e.
>
> p = xa_load(key);
> if (p)
> return p;
>
> xfs_attr_get(key);
> if (!args->value)
> /* error */
>
> /*
> * store the current value, freeing any old value that we
> * replaced at this key. Don't care about failure to store,
> * this is optimistic caching.
> */
> p = xa_store(key, args->value, GFP_NOFS);
> if (p)
> kvfree(p);
> return args->value;
Attractive. Will have to take a look at that tomorrow.
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2024-03-08 3:31 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-04 19:10 [PATCH v5 00/24] fs-verity support for XFS Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 01/24] fsverity: remove hash page spin lock Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 02/24] xfs: add parent pointer support to attribute code Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 03/24] xfs: define parent pointer ondisk extended attribute format Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 04/24] xfs: add parent pointer validator functions Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 05/24] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
2024-03-04 22:35 ` Eric Biggers
2024-03-07 21:39 ` Darrick J. Wong
2024-03-07 22:06 ` Eric Biggers
2024-03-04 19:10 ` [PATCH v5 06/24] fsverity: pass tree_blocksize to end_enable_verity() Andrey Albershteyn
2024-03-05 0:52 ` Eric Biggers
2024-03-06 16:30 ` Darrick J. Wong
2024-03-07 22:02 ` Eric Biggers
2024-03-08 3:46 ` Darrick J. Wong
2024-03-08 4:40 ` Eric Biggers
2024-03-11 22:38 ` Darrick J. Wong
2024-03-12 15:13 ` David Hildenbrand
2024-03-12 15:33 ` David Hildenbrand
2024-03-12 16:44 ` Darrick J. Wong
2024-03-13 12:29 ` David Hildenbrand
2024-03-13 17:19 ` Darrick J. Wong
2024-03-13 19:10 ` David Hildenbrand
2024-03-13 21:03 ` David Hildenbrand
2024-03-08 21:34 ` Dave Chinner
2024-03-09 16:19 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 07/24] fsverity: support block-based Merkle tree caching Andrey Albershteyn
2024-03-06 3:56 ` Eric Biggers
2024-03-07 21:54 ` Darrick J. Wong
2024-03-07 22:49 ` Eric Biggers
2024-03-08 3:50 ` Darrick J. Wong
2024-03-09 16:24 ` Darrick J. Wong
2024-03-11 23:22 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 08/24] fsverity: add per-sb workqueue for post read processing Andrey Albershteyn
2024-03-05 1:08 ` Eric Biggers
2024-03-07 21:58 ` Darrick J. Wong
2024-03-07 22:26 ` Eric Biggers
2024-03-08 3:53 ` Darrick J. Wong
2024-03-07 22:55 ` Dave Chinner
2024-03-04 19:10 ` [PATCH v5 09/24] fsverity: add tracepoints Andrey Albershteyn
2024-03-05 0:33 ` Eric Biggers
2024-03-04 19:10 ` [PATCH v5 10/24] iomap: integrate fs-verity verification into iomap's read path Andrey Albershteyn
2024-03-04 23:39 ` Eric Biggers
2024-03-07 22:06 ` Darrick J. Wong
2024-03-07 22:19 ` Eric Biggers
2024-03-07 23:38 ` Dave Chinner
2024-03-07 23:45 ` Darrick J. Wong
2024-03-08 0:47 ` Dave Chinner
2024-03-07 23:59 ` Eric Biggers
2024-03-08 1:20 ` Dave Chinner
2024-03-08 3:16 ` Eric Biggers
2024-03-08 3:57 ` Darrick J. Wong
2024-03-08 3:22 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 11/24] xfs: add XBF_VERITY_SEEN xfs_buf flag Andrey Albershteyn
2024-03-07 22:46 ` Darrick J. Wong
2024-03-08 1:59 ` Dave Chinner
2024-03-08 3:31 ` Darrick J. Wong [this message]
2024-03-09 16:28 ` Darrick J. Wong
2024-03-11 0:26 ` Dave Chinner
2024-03-11 15:25 ` Darrick J. Wong
2024-03-12 2:43 ` Eric Biggers
2024-03-12 4:15 ` Darrick J. Wong
2024-03-12 2:45 ` Darrick J. Wong
2024-03-12 7:01 ` Dave Chinner
2024-03-12 20:04 ` Darrick J. Wong
2024-03-12 21:45 ` Dave Chinner
2024-03-04 19:10 ` [PATCH v5 12/24] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 13/24] xfs: add attribute type for fs-verity Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 14/24] xfs: make xfs_buf_get() to take XBF_* flags Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 15/24] xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 16/24] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 17/24] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2024-03-07 22:06 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 18/24] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2024-03-07 22:09 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 19/24] xfs: don't allow to enable DAX on fs-verity sealsed inode Andrey Albershteyn
2024-03-07 22:09 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 20/24] xfs: disable direct read path for fs-verity files Andrey Albershteyn
2024-03-07 22:11 ` Darrick J. Wong
2024-03-12 12:02 ` Andrey Albershteyn
2024-03-12 16:36 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 21/24] xfs: add fs-verity support Andrey Albershteyn
2024-03-06 4:55 ` Eric Biggers
2024-03-06 5:01 ` Eric Biggers
2024-03-07 23:10 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 22/24] xfs: make scrub aware of verity dinode flag Andrey Albershteyn
2024-03-07 22:18 ` Darrick J. Wong
2024-03-12 12:10 ` Andrey Albershteyn
2024-03-12 16:38 ` Darrick J. Wong
2024-03-13 1:35 ` Darrick J. Wong
2024-03-04 19:10 ` [PATCH v5 23/24] xfs: add fs-verity ioctls Andrey Albershteyn
2024-03-07 22:14 ` Darrick J. Wong
2024-03-12 12:42 ` Andrey Albershteyn
2024-03-04 19:10 ` [PATCH v5 24/24] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
2024-03-07 22:16 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240308033138.GN6184@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=aalbersh@redhat.com \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=ebiggers@kernel.org \
--cc=fsverity@lists.linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).