public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Andrey Albershteyn <aalbersh@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 09/11] iomap: fs-verity verification on page read
Date: Wed, 14 Dec 2022 16:43:57 +1100	[thread overview]
Message-ID: <20221214054357.GI3600936@dread.disaster.area> (raw)
In-Reply-To: <20221213172935.680971-10-aalbersh@redhat.com>

On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote:
> Add fs-verity page verification in read IO path. The verification
> itself is offloaded into workqueue (provided by fs-verity).
> 
> The work_struct items are allocated from bioset side by side with
> bio being processed.
> 
> As inodes with fs-verity doesn't use large folios we check only
> first page of the folio for errors (set by fs-verity if verification
> failed).
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++---
>  include/linux/iomap.h  |  5 +++
>  2 files changed, 81 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 91ee0b308e13d..b7abc2f806cfc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -17,6 +17,7 @@
>  #include <linux/bio.h>
>  #include <linux/sched/signal.h>
>  #include <linux/migrate.h>
> +#include <linux/fsverity.h>
>  #include "trace.h"
>  
>  #include "../internal.h"
> @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
>  }
>  
>  static struct bio_set iomap_ioend_bioset;
> +static struct bio_set iomap_readend_bioset;
>  
>  static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
> @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio)
>  	int error = blk_status_to_errno(bio->bi_status);
>  	struct folio_iter fi;
>  
> -	bio_for_each_folio_all(fi, bio)
> +	bio_for_each_folio_all(fi, bio) {
> +		/*
> +		 * As fs-verity doesn't work with multi-page folios, verity
> +		 * inodes have large folios disabled (only single page folios
> +		 * are used)
> +		 */
> +		if (!error)
> +			error = PageError(folio_page(fi.folio, 0));
> +
>  		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> +	}
> +
>  	bio_put(bio);
> +	/* The iomap_readend has been freed by bio_put() */
> +}
> +
> +static void iomap_read_work_end_io(
> +	struct work_struct *work)
> +{
> +	struct iomap_readend *ctx =
> +		container_of(work, struct iomap_readend, read_work);
> +	struct bio *bio = &ctx->read_inline_bio;
> +
> +	fsverity_verify_bio(bio);
> +	iomap_read_end_io(bio);
> +}
> +
> +static void iomap_read_work_io(struct bio *bio)
> +{
> +	struct iomap_readend *ctx =
> +		container_of(bio, struct iomap_readend, read_inline_bio);
> +
> +	fsverity_enqueue_verify_work(&ctx->read_work);
>  }

Ok, so fsverity simple queues this to a global workqueue it has set
up as WQ_UNBOUND | WQ_HIGHPRI with, effectively, one worker thread
per CPU. This will work for simple cases and to get the basic
infrastructure in place, but there are problems here that we will
need to address.

1. High priority workqueues are used within XFS to ensure that data
IO completion cannot stall processing of journal IO completions. We
use them to prevent IO priority inversion deadlocks.

That is, journal IO completions use a WQ_HIGHPRI workqueue to ensure
that they are scheduled ahead of data IO completions as data IO
completions may require journal IO to complete before they can make
progress (e.g. to ensure transaction reservations in IO completion
can make progress).

Hence using a WQ_HIGHPRI workqueue directly in the user data IO path
is a potential filesystem livelock/deadlock vector. At best, it's
going to impact on the latency of journal commits and hence anything
that is running data integrity operations whilst fsverity read
operations are in progress.

The second thing is that the fsverity workqueue is global - it
creates a cross-filesystem contention point. That means a single
busy fsverity filesystem will create unpredictable IO latency for
filesystems that only use fsverity sparingly. i.e. heavy amounts of
fsverity work on one filesystem will impact the performance of
journal and other IO operations on all other active filesystems due
to fsverity using WQ_HIGHPRI.

This sort of workqueue setup is typically fine for a single user
device that has lots of otherwise idle CPU that can be co-opted for
create concurrency in IO completion work where there would otherwise
been none (e.g. an Android phone).

However, it is less than ideal for a high concurrent application
using AIO or io_uring to push a couple of million IOPS through the
filesystem. In these sitations, we don't want IO completion work to
be spread outi because there is no idle CPU for them to be run on.
Instead, locality is king - we want them run on the same CPU that
already has the inode, bio, and other objects hot in the cache.

So, really, for XFS we want per-filesystem, locality preserving
per-cpu workqueues for fsverity work as we get IO concurrency (and
therefore fsverity work concurrency) from the application IO
patterns. And it avoids all potential issues with using WQ_HIGHPRI
for journal IO to avoid data IO completion deadlocks.


>  struct iomap_readpage_ctx {
> @@ -264,6 +296,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  	loff_t orig_pos = pos;
>  	size_t poff, plen;
>  	sector_t sector;
> +	struct iomap_readend *readend;
>  
>  	if (iomap->type == IOMAP_INLINE)
>  		return iomap_read_inline_data(iter, folio);
> @@ -276,7 +309,21 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  
>  	if (iomap_block_needs_zeroing(iter, pos)) {
>  		folio_zero_range(folio, poff, plen);
> -		iomap_set_range_uptodate(folio, iop, poff, plen);
> +		if (!fsverity_active(iter->inode)) {
> +			iomap_set_range_uptodate(folio, iop, poff, plen);
> +			goto done;
> +		}
> +
> +		/*
> +		 * As fs-verity doesn't work with folios sealed inodes have
> +		 * multi-page folios disabled and we can check on first and only
> +		 * page
> +		 */
> +		if (fsverity_verify_page(folio_page(folio, 0)))
> +			iomap_set_range_uptodate(folio, iop, poff, plen);
> +		else
> +			folio_set_error(folio);

This makes me wonder if this should be a iomap->page_op method
rather than calling fsverity code directly. i.e. if the filesystem
knows that it fsverity is enabled on the inode during it's
iomap_begin() callout, and that state *must not change* until the IO
is complete. Hence it can set a iomap flag saying
IOMAP_F_READ_VERIFY and add a page_ops vector with a "verify_folio"
callout. Then this code can do:

	if (iomap_block_needs_zeroing(iter, pos)) {
		folio_zero_range(folio, poff, plen);
		if (iomap->flags & IOMAP_F_READ_VERIFY) {
			if (!iomap->page_ops->verify_folio(folio, poff, plen)) {
				folio_set_error(folio);
				goto done;
			}
		}
		iomap_set_range_uptodate(folio, iop, poff, plen);
		got done;
	}

> @@ -297,8 +344,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  
>  		if (ctx->rac) /* same as readahead_gfp_mask */
>  			gfp |= __GFP_NORETRY | __GFP_NOWARN;
> -		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> +		if (fsverity_active(iter->inode)) {
> +			ctx->bio = bio_alloc_bioset(iomap->bdev,
> +					bio_max_segs(nr_vecs), REQ_OP_READ,
> +					GFP_NOFS, &iomap_readend_bioset);
> +			readend = container_of(ctx->bio,
> +					struct iomap_readend,
> +					read_inline_bio);
> +			INIT_WORK(&readend->read_work, iomap_read_work_end_io);
> +		} else {
> +			ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
>  				     REQ_OP_READ, gfp);
> +		}
>  		/*
>  		 * If the bio_alloc fails, try it again for a single page to
>  		 * avoid having to deal with partial page reads.  This emulates
> @@ -311,7 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  		if (ctx->rac)
>  			ctx->bio->bi_opf |= REQ_RAHEAD;
>  		ctx->bio->bi_iter.bi_sector = sector;
> -		ctx->bio->bi_end_io = iomap_read_end_io;
> +		if (fsverity_active(iter->inode))
> +			ctx->bio->bi_end_io = iomap_read_work_io;
> +		else
> +			ctx->bio->bi_end_io = iomap_read_end_io;


Hmmm. OK, why not just always use the iomap_readend_bioset, put
a flags field in it and then do this check in iomap_read_end_io()?

i.e.

struct iomap_read_ioend {
	bool			queue_work;
	struct work_struct	work;	/* post read work (fs-verity) */
	struct bio		inline_bio;/* MUST BE LAST! */
};


Then always allocate from the iomap_read_ioend_bioset, and do:

		ctx->bio->bi_end_io = iomap_read_end_io;
		if (iomap->flags & IOMAP_F_VERITY) {
			read_ioend->queue_work = true;
			INIT_WORK(&read_ioend->work, iomap_read_ioend_work);
		}

The in iomap_read_end_io():

	if (ioend->queue_work) {
		fsverity_enqueue_verify_work(&ctx->work);
		return;
	}
	iomap_read_end_io(bio);

And if we put a fs supplied workqueue into the iomap as well, then
we have a mechanism for the filesystem to supply it's own workqueue
for fsverity, fscrypt, or any other read-side post-IO data processing
functions filesystems care to add.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2022-12-14  5:44 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
2022-12-14  0:53   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper Andrey Albershteyn
2022-12-13 17:55   ` Matthew Wilcox
2022-12-13 19:33     ` Eric Biggers
2022-12-13 21:10       ` Dave Chinner
2022-12-14  6:52         ` Eric Biggers
2022-12-14  8:12           ` Dave Chinner
2022-12-13 21:08     ` Dave Chinner
2023-01-09 16:34       ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 03/11] xfs: add attribute type for fs-verity Andrey Albershteyn
2022-12-13 17:43   ` Eric Sandeen
2022-12-14  1:03     ` Dave Chinner
2023-01-09 16:37       ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2022-12-14  1:06   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2022-12-14  1:29   ` Dave Chinner
2023-01-09 16:51     ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2022-12-14  1:35   ` Dave Chinner
2022-12-14  5:25     ` Eric Biggers
2022-12-14  8:18       ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
2022-12-14  2:07   ` Dave Chinner
2022-12-14  5:44     ` Eric Biggers
2022-12-23 16:18   ` Christoph Hellwig
2023-01-09 17:23     ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode Andrey Albershteyn
2022-12-14  2:07   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
2022-12-13 19:02   ` Eric Biggers
2023-01-09 16:58     ` Andrey Albershteyn
2022-12-14  5:43   ` Dave Chinner [this message]
2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
2022-12-13 19:08   ` Eric Biggers
2022-12-13 19:22     ` Darrick J. Wong
2022-12-13 20:13       ` Eric Biggers
2022-12-13 20:33     ` Dave Chinner
2022-12-13 20:39       ` Eric Biggers
2022-12-13 21:40         ` Dave Chinner
2022-12-14  7:58   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 11/11] xfs: add fs-verity ioctls Andrey Albershteyn
2022-12-13 20:50 ` [RFC PATCH 00/11] fs-verity support for XFS Eric Biggers
2022-12-13 22:11   ` Dave Chinner
2022-12-14  6:31     ` Eric Biggers
2022-12-14 23:06       ` Dave Chinner
2022-12-15  6:47         ` Eric Biggers
2022-12-15 20:57           ` Dave Chinner
2022-12-16  5:04             ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221214054357.GI3600936@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=aalbersh@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox