From: Dave Chinner <david@fromorbit.com>
To: Andrey Albershteyn <aalbersh@redhat.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
djwong@kernel.org, dchinner@redhat.com, hch@infradead.org,
linux-xfs@vger.kernel.org, fsverity@lists.linux.dev,
rpeterso@redhat.com, agruenba@redhat.com, xiang@kernel.org,
chao@kernel.org, damien.lemoal@opensource.wdc.com,
jth@kernel.org, linux-erofs@lists.ozlabs.org,
linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com
Subject: Re: [PATCH v2 21/23] xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE
Date: Thu, 6 Apr 2023 08:51:25 +1000 [thread overview]
Message-ID: <20230405225125.GS3223426@dread.disaster.area> (raw)
In-Reply-To: <20230405151234.sgkuasb7lwmgetzz@aalbersh.remote.csb>
On Wed, Apr 05, 2023 at 05:12:34PM +0200, Andrey Albershteyn wrote:
> Hi Eric,
>
> On Tue, Apr 04, 2023 at 04:32:24PM -0700, Eric Biggers wrote:
> > Hi Andrey,
> >
> > On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> > > In case of different Merkle tree block size fs-verity expects
> > > ->read_merkle_tree_page() to return Merkle tree page filled with
> > > Merkle tree blocks. The XFS stores each merkle tree block under
> > > extended attribute. Those attributes are addressed by block offset
> > > into Merkle tree.
> > >
> > > This patch make ->read_merkle_tree_page() to fetch multiple merkle
> > > tree blocks based on size ratio. Also the reference to each xfs_buf
> > > is passed with page->private to ->drop_page().
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > ---
> > > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> > > fs/xfs/xfs_verity.h | 8 +++++
> > > 2 files changed, 66 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> > > index a9874ff4efcd..ef0aff216f06 100644
> > > --- a/fs/xfs/xfs_verity.c
> > > +++ b/fs/xfs/xfs_verity.c
> > > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> > > struct page *page = NULL;
> > > __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> > > uint32_t bs = 1 << log_blocksize;
> > > + int blocks_per_page =
> > > + (1 << (PAGE_SHIFT - log_blocksize));
> > > + int n = 0;
> > > + int offset = 0;
> > > struct xfs_da_args args = {
> > > .dp = ip,
> > > .attr_filter = XFS_ATTR_VERITY,
> > > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> > > .valuelen = bs,
> > > };
> > > int error = 0;
> > > + bool is_checked = true;
> > > + struct xfs_verity_buf_list *buf_list;
> > >
> > > page = alloc_page(GFP_KERNEL);
> > > if (!page)
> > > return ERR_PTR(-ENOMEM);
> > >
> > > - error = xfs_attr_get(&args);
> > > - if (error) {
> > > - kmem_free(args.value);
> > > - xfs_buf_rele(args.bp);
> > > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> > > + if (!buf_list) {
> > > put_page(page);
> > > - return ERR_PTR(-EFAULT);
> > > + return ERR_PTR(-ENOMEM);
> > > }
> > >
> > > - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> > > + /*
> > > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> > > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> > > + * PAGE SIZE
> > > + */
> > > + for (n = 0; n < blocks_per_page; n++) {
> > > + offset = bs * n;
> > > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
> > > + args.name = (const uint8_t *)&name;
> > > +
> > > + error = xfs_attr_get(&args);
> > > + if (error) {
> > > + kmem_free(args.value);
> > > + /*
> > > + * No more Merkle tree blocks (e.g. this was the last
> > > + * block of the tree)
> > > + */
> > > + if (error == -ENOATTR)
> > > + break;
> > > + xfs_buf_rele(args.bp);
> > > + put_page(page);
> > > + kmem_free(buf_list);
> > > + return ERR_PTR(-EFAULT);
> > > + }
> > > +
> > > + buf_list->bufs[buf_list->buf_count++] = args.bp;
> > > +
> > > + /* One of the buffers was dropped */
> > > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> > > + is_checked = false;
> > > +
> > > + memcpy(page_address(page) + offset, args.value, args.valuelen);
> > > + kmem_free(args.value);
> > > + args.value = NULL;
> > > + }
> >
> > I was really hoping for a solution where the cached data can be used directly,
> > instead allocating a temporary page and copying the cached data into it every
> > time the cache is accessed. The problem with what you have now is that every
> > time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be
> > allocated and filled. That's not very efficient. The need to allocate the
> > temporary page can also cause ENOMEM (which will get reported as EIO).
> >
> > Did you consider alternatives that would work more efficiently? I think it
> > would be worth designing something that works properly with how XFS is planned
> > to cache the Merkle tree, instead of designing a workaround.
> > ->read_merkle_tree_page was not really designed for what you are doing here.
> >
> > How about replacing ->read_merkle_tree_page with a function that takes in a
> > Merkle tree block index (not a page index!) and hands back a (page, offset) pair
> > that identifies where the Merkle tree block's data is located? Or (folio,
> > offset), I suppose.
{kaddr, len}, please.
> >
> > With that, would it be possible to directly return the XFS cache?
> >
> > - Eric
> >
>
> Yeah, I also don't like it, I didn't want to change fs-verity much
> so went with this workaround. But as it's ok, I will look into trying
> to pass xfs buffers to fs-verity without direct use of
> ->read_merkle_tree_page(). I think it's possible with (folio,
> offset), the xfs buffers aren't xattr value align so the 4k merkle
> tree block is stored in two pages.
I don't think this is necessary to actually merge the code. We want
it to work correctly as the primary concern, performance is a
secondary concern.
Regardless, as you mention, the xfs_buf is not made up of contiguous
pages so the merkle tree block data will be split across two
(or more) pages. AFAICT, the fsverity code doesn't work with data
structures that span multiple disjoint pages...
Another problem is that the xfs-buf might be backed by heap memory
(e.g. 4kB fs block size on 64kB PAGE_SIZE) and so it cannot be
treated like a page cache page by the fsverity merkle tree code. We
most definitely do not want to be passing pages containing heap
memory to functions expecting to be passed lru-resident page cache
pages....
That said, xfs-bufs do have a stable method of addressing the data
in the buffers, and all the XFS code uses this to access and
manipulate data directly in the buffers.
That is, xfs_buf_offset() returns a mapped kaddr that points to the
contiguous memory region containing the metadata in the buffer. If
the xfs_buf spans multiple pages, it will return a kaddr pointing
into the contiguous vmapped memory address that maps the entire
buffer data range. If it is heap memory, it simply returns a pointer
into that heap memory. If it's a single page, then it returns the
kaddr for the data within the page.
If you move all the assumptions about how the merkle tree data is
managed out of fsverity and require the fielsystems to do the
mapping to kaddrs and reference counting to guarantee life times,
then the need for multiple different methods for reading merkle tree
data go away...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-04-05 22:51 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 14:52 [PATCH v2 00/23] fs-verity support for XFS Andrey Albershteyn
2023-04-04 14:52 ` [PATCH v2 01/23] xfs: Add new name to attri/d Andrey Albershteyn
2023-04-04 14:52 ` [PATCH v2 02/23] xfs: add parent pointer support to attribute code Andrey Albershteyn
2023-04-04 14:52 ` [PATCH v2 03/23] xfs: define parent pointer xattr format Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 04/23] xfs: Add xfs_verify_pptr Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 05/23] fsverity: make fsverity_verify_folio() accept folio's offset and size Andrey Albershteyn
2023-04-04 15:30 ` Christoph Hellwig
2023-04-05 10:36 ` Andrey Albershteyn
2023-04-05 15:46 ` Christoph Hellwig
2023-04-05 17:50 ` Eric Biggers
2023-04-04 14:53 ` [PATCH v2 06/23] fsverity: add drop_page() callout Andrey Albershteyn
2023-04-04 23:40 ` Dave Chinner
2023-04-05 10:39 ` Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 07/23] fsverity: pass Merkle tree block size to ->read_merkle_tree_page() Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 08/23] iomap: hoist iomap_readpage_ctx from the iomap_readahead/_folio Andrey Albershteyn
2023-04-04 15:32 ` Christoph Hellwig
2023-04-04 14:53 ` [PATCH v2 09/23] iomap: allow filesystem to implement read path verification Andrey Albershteyn
2023-04-04 15:37 ` Christoph Hellwig
2023-04-05 11:01 ` Andrey Albershteyn
2023-04-05 15:06 ` Darrick J. Wong
2023-04-05 15:48 ` Christoph Hellwig
2023-04-04 14:53 ` [PATCH v2 10/23] xfs: add XBF_VERITY_CHECKED xfs_buf flag Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 11/23] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 12/23] xfs: introduce workqueue for post read IO work Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 13/23] xfs: add iomap's readpage operations Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 14/23] xfs: add attribute type for fs-verity Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 15/23] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 16/23] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2023-04-04 22:41 ` Eric Biggers
2023-04-04 23:56 ` Dave Chinner
2023-04-05 11:07 ` Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 17/23] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 18/23] xfs: don't allow to enable DAX on fs-verity sealsed inode Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 19/23] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
2023-04-04 16:10 ` Darrick J. Wong
2023-04-05 15:01 ` Andrey Albershteyn
2023-04-05 15:09 ` Darrick J. Wong
2023-04-05 15:50 ` Christoph Hellwig
2023-04-05 18:02 ` Eric Biggers
2023-04-05 22:14 ` Dave Chinner
2023-04-05 22:10 ` Dave Chinner
2023-04-04 14:53 ` [PATCH v2 20/23] xfs: add fs-verity support Andrey Albershteyn
2023-04-04 16:27 ` Darrick J. Wong
2023-04-05 15:18 ` Eric Sandeen
2023-04-04 18:01 ` kernel test robot
2023-04-04 20:03 ` kernel test robot
2023-04-04 14:53 ` [PATCH v2 21/23] xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE Andrey Albershteyn
2023-04-04 16:36 ` Darrick J. Wong
2023-04-05 16:02 ` Andrey Albershteyn
2023-04-05 16:38 ` Darrick J. Wong
2023-04-05 18:16 ` Eric Biggers
2023-04-05 22:26 ` Dave Chinner
2023-04-05 22:54 ` Eric Biggers
2023-04-05 23:37 ` Dave Chinner
2023-04-06 0:44 ` Eric Biggers
2023-04-07 19:56 ` Eric Biggers
2023-04-04 23:32 ` Eric Biggers
2023-04-05 15:12 ` Andrey Albershteyn
2023-04-05 22:51 ` Dave Chinner [this message]
2023-04-04 14:53 ` [PATCH v2 22/23] xfs: add fs-verity ioctls Andrey Albershteyn
2023-04-04 14:53 ` [PATCH v2 23/23] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
2023-04-04 16:39 ` [PATCH v2 00/23] fs-verity support for XFS Darrick J. Wong
2023-04-05 16:27 ` Andrey Albershteyn
2023-04-04 23:37 ` Eric Biggers
2023-04-05 16:04 ` Andrey Albershteyn
2023-04-11 5:19 ` Christoph Hellwig
2023-04-12 2:33 ` Eric Biggers
2023-04-12 3:18 ` Dave Chinner
2023-04-12 12:42 ` Christoph Hellwig
2023-04-12 12:40 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230405225125.GS3223426@dread.disaster.area \
--to=david@fromorbit.com \
--cc=aalbersh@redhat.com \
--cc=agruenba@redhat.com \
--cc=chao@kernel.org \
--cc=cluster-devel@redhat.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=ebiggers@kernel.org \
--cc=fsverity@lists.linux.dev \
--cc=hch@infradead.org \
--cc=jth@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-xfs@vger.kernel.org \
--cc=rpeterso@redhat.com \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox