All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 09/12] iomap: lift the xfs writeback code to iomap
Date: Wed, 16 Oct 2019 09:07:21 +1100	[thread overview]
Message-ID: <20191015220721.GC16973@dread.disaster.area> (raw)
In-Reply-To: <20191015154345.13052-10-hch@lst.de>

On Tue, Oct 15, 2019 at 05:43:42PM +0200, Christoph Hellwig wrote:
> Take the xfs writeback code and move it to fs/iomap.  A new structure
> with three methods is added as the abstraction from the generic writeback
> code to the file system.  These methods are used to map blocks, submit an
> ioend, and cancel a page that encountered an error before it was added to
> an ioend.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 564 ++++++++++++++++++++++++++++++++++-
>  fs/iomap/trace.h       |  39 +++
>  fs/xfs/xfs_aops.c      | 662 ++++-------------------------------------
>  fs/xfs/xfs_aops.h      |  17 --
>  fs/xfs/xfs_super.c     |  11 +-
>  fs/xfs/xfs_trace.h     |  39 ---
>  include/linux/iomap.h  |  59 ++++
>  7 files changed, 722 insertions(+), 669 deletions(-)
.....
> @@ -468,6 +471,8 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>  int
>  iomap_releasepage(struct page *page, gfp_t gfp_mask)
>  {
> +	trace_iomap_releasepage(page->mapping->host, page, 0, 0);
> +
>  	/*
>  	 * mm accommodates an old ext3 case where clean pages might not have had
>  	 * the dirty bit cleared. Thus, it can send actual dirty pages to
> @@ -483,6 +488,8 @@ EXPORT_SYMBOL_GPL(iomap_releasepage);
>  void
>  iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
>  {
> +	trace_iomap_invalidatepage(page->mapping->host, page, offset, len);
> +

These tracepoints should be split out into a separate patch like
the readpage(s) tracepoints. Maybe just lift all the non-writeback
ones in a single patch...

>  	/*
>  	 * If we are invalidating the entire page, clear the dirty state from it
>  	 * and release it to avoid unnecessary buildup of the LRU.
> @@ -1084,3 +1091,558 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
>  	return block_page_mkwrite_return(ret);
>  }
>  EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
> +
> +static void
> +iomap_finish_page_writeback(struct inode *inode, struct bio_vec *bvec,
> +		int error)
> +{
> +	struct iomap_page *iop = to_iomap_page(bvec->bv_page);
> +
> +	if (error) {
> +		SetPageError(bvec->bv_page);
> +		mapping_set_error(inode->i_mapping, -EIO);
> +	}
> +
> +	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
> +
> +	if (!iop || atomic_dec_and_test(&iop->write_count))
> +		end_page_writeback(bvec->bv_page);
> +}

Can we just pass the struct page into this function?

.....

> +/*
> + * Submit the bio for an ioend. We are passed an ioend with a bio attached to
> + * it, and we submit that bio. The ioend may be used for multiple bio
> + * submissions, so we only want to allocate an append transaction for the ioend
> + * once.  In the case of multiple bio submission, each bio will take an IO

This needs to be changed to describe what wpc->ops->submit_ioend()
is used for rather than what XFS might use this hook for.

> + * reference to the ioend to ensure that the ioend completion is only done once
> + * all bios have been submitted and the ioend is really done.
> + *
> + * If @error is non-zero, it means that we have a situation where some part of
> + * the submission process has failed after we have marked paged for writeback
> + * and unlocked them. In this situation, we need to fail the bio and ioend
> + * rather than submit it to IO. This typically only happens on a filesystem
> + * shutdown.
> + */
> +static int
> +iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
> +		int error)
> +{
> +	ioend->io_bio->bi_private = ioend;
> +	ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
> +
> +	if (wpc->ops->submit_ioend)
> +		error = wpc->ops->submit_ioend(ioend, error);

I'm not sure that "submit_ioend" is the best name for this method,
as it is a pre-bio-submission hook, not an actual IO submission
method. "prepare_ioend_for_submit" is more descriptive, but probably
too long. wpc->ops->prepare_submit(ioend, error) reads pretty well,
though...

> +	if (error) {
> +		/*
> +		 * If we are failing the IO now, just mark the ioend with an
> +		 * error and finish it.  This will run IO completion immediately
> +		 * as there is only one reference to the ioend at this point in
> +		 * time.
> +		 */
> +		ioend->io_bio->bi_status = errno_to_blk_status(error);
> +		bio_endio(ioend->io_bio);
> +		return error;
> +	}
> +
> +	submit_bio(ioend->io_bio);
> +	return 0;
> +}

.....
> +/*
> + * We implement an immediate ioend submission policy here to avoid needing to
> + * chain multiple ioends and hence nest mempool allocations which can violate
> + * forward progress guarantees we need to provide. The current ioend we are
> + * adding blocks to is cached on the writepage context, and if the new block

adding pages to ... , and if the new block mapping

> + * does not append to the cached ioend it will create a new ioend and cache that
> + * instead.
> + *
> + * If a new ioend is created and cached, the old ioend is returned and queued
> + * locally for submission once the entire page is processed or an error has been
> + * detected.  While ioends are submitted immediately after they are completed,
> + * batching optimisations are provided by higher level block plugging.
> + *
> + * At the end of a writeback pass, there will be a cached ioend remaining on the
> + * writepage context that the caller will need to submit.
> + */
> +static int
> +iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> +		struct writeback_control *wbc, struct inode *inode,
> +		struct page *page, u64 end_offset)
> +{
> +	struct iomap_page *iop = to_iomap_page(page);
> +	struct iomap_ioend *ioend, *next;
> +	unsigned len = i_blocksize(inode);
> +	u64 file_offset; /* file offset of page */
> +	int error = 0, count = 0, i;
> +	LIST_HEAD(submit_list);
> +
> +	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
> +
> +	/*
> +	 * Walk through the page to find areas to write back. If we run off the
> +	 * end of the current map or find the current map invalid, grab a new
> +	 * one.
> +	 */
> +	for (i = 0, file_offset = page_offset(page);
> +	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
> +	     i++, file_offset += len) {
> +		if (iop && !test_bit(i, iop->uptodate))
> +			continue;
> +
> +		error = wpc->ops->map_blocks(wpc, inode, file_offset);
> +		if (error)
> +			break;
> +		if (wpc->iomap.type == IOMAP_HOLE)
> +			continue;
> +		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
> +				 &submit_list);
> +		count++;
> +	}
> +
> +	WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
> +	WARN_ON_ONCE(!PageLocked(page));
> +	WARN_ON_ONCE(PageWriteback(page));
> +
> +	/*
> +	 * On error, we have to fail the ioend here because we may have set
> +	 * pages under writeback, we have to make sure we run IO completion to
> +	 * mark the error state of the IO appropriately, so we can't cancel the
> +	 * ioend directly here.

Few too many commas and run-ons here. Maybe reword it like this:

	/*
	 * We cannot cancel the ioend directly here if there is a submission
	 * error. We may have already set pages under writeback and hence we
	 * have to run IO completion to mark the error state of the pages under
	 * writeback appropriately.

>
>
>				That means we have to mark this page as under
> +	 * writeback if we included any blocks from it in the ioend chain so
> +	 * that completion treats it correctly.
> +	 *
> +	 * If we didn't include the page in the ioend, the on error we can
                                                       then on error

> +	 * simply discard and unlock it as there are no other users of the page
> +	 * now.  The caller will still need to trigger submission of outstanding
> +	 * ioends on the writepage context so they are treated correctly on
> +	 * error.
> +	 */

.....

> +static int
> +iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
> +{
> +	struct iomap_writepage_ctx *wpc = data;
> +	struct inode *inode = page->mapping->host;
> +	pgoff_t end_index;
> +	u64 end_offset;
> +	loff_t offset;
> +
> +	trace_iomap_writepage(inode, page, 0, 0);
> +
> +	/*
> +	 * Refuse to write the page out if we are called from reclaim context.
> +	 *
> +	 * This avoids stack overflows when called from deeply used stacks in
> +	 * random callers for direct reclaim or memcg reclaim.  We explicitly
> +	 * allow reclaim from kswapd as the stack usage there is relatively low.
> +	 *
> +	 * This should never happen except in the case of a VM regression so
> +	 * warn about it.
> +	 */
> +	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> +			PF_MEMALLOC))
> +		goto redirty;
> +
> +	/*
> +	 * Given that we do not allow direct reclaim to call us, we should
> +	 * never be called while in a filesystem transaction.
> +	 */

	   never be called in a recursive filesystem reclaim context.

> +	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> +		goto redirty;
> +

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2019-10-15 22:07 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15 15:43 lift the xfs writepage code into iomap v7 Christoph Hellwig
2019-10-15 15:43 ` [PATCH 01/12] xfs: initialize iomap->flags in xfs_bmbt_to_iomap Christoph Hellwig
2019-10-15 18:00   ` Darrick J. Wong
2019-10-15 21:20   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 02/12] xfs: set IOMAP_F_NEW more carefully Christoph Hellwig
2019-10-15 18:01   ` Darrick J. Wong
2019-10-15 21:21   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 03/12] xfs: use a struct iomap in xfs_writepage_ctx Christoph Hellwig
2019-10-15 21:23   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 04/12] xfs: refactor the ioend merging code Christoph Hellwig
2019-10-15 17:56   ` Darrick J. Wong
2019-10-15 17:59     ` Christoph Hellwig
2019-10-15 21:26   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 05/12] xfs: turn io_append_trans into an io_private void pointer Christoph Hellwig
2019-10-15 18:02   ` Darrick J. Wong
2019-10-15 21:28   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 06/12] xfs: remove the fork fields in the writepage_ctx and ioend Christoph Hellwig
2019-10-15 18:05   ` Darrick J. Wong
2019-10-15 21:31   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 07/12] iomap: zero newly allocated mapped blocks Christoph Hellwig
2019-10-15 21:32   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 08/12] iomap: lift the xfs readpage / readpages tracing to iomap Christoph Hellwig
2019-10-15 18:06   ` Darrick J. Wong
2019-10-15 21:35   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 09/12] iomap: lift the xfs writeback code " Christoph Hellwig
2019-10-15 18:40   ` Darrick J. Wong
2019-10-16  7:43     ` Christoph Hellwig
2019-10-15 22:07   ` Dave Chinner [this message]
2019-10-16  5:09     ` Darrick J. Wong
2019-10-16  7:48     ` Christoph Hellwig
2019-10-16 12:08       ` Brian Foster
2019-10-16  8:10     ` Christoph Hellwig
2019-10-15 15:43 ` [PATCH 10/12] iomap: warn on inline maps in iomap_writepage_map Christoph Hellwig
2019-10-15 22:08   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 11/12] iomap: move struct iomap_page out of iomap.h Christoph Hellwig
2019-10-15 22:09   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 12/12] iomap: cleanup iomap_ioend_compare Christoph Hellwig
2019-10-15 17:57   ` Darrick J. Wong
2019-10-15 22:10   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191015220721.GC16973@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=agruenba@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.