public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 09/12] iomap: lift the xfs writeback code to iomap
Date: Wed, 16 Oct 2019 09:07:21 +1100	[thread overview]
Message-ID: <20191015220721.GC16973@dread.disaster.area> (raw)
In-Reply-To: <20191015154345.13052-10-hch@lst.de>

On Tue, Oct 15, 2019 at 05:43:42PM +0200, Christoph Hellwig wrote:
> Take the xfs writeback code and move it to fs/iomap.  A new structure
> with three methods is added as the abstraction from the generic writeback
> code to the file system.  These methods are used to map blocks, submit an
> ioend, and cancel a page that encountered an error before it was added to
> an ioend.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 564 ++++++++++++++++++++++++++++++++++-
>  fs/iomap/trace.h       |  39 +++
>  fs/xfs/xfs_aops.c      | 662 ++++-------------------------------------
>  fs/xfs/xfs_aops.h      |  17 --
>  fs/xfs/xfs_super.c     |  11 +-
>  fs/xfs/xfs_trace.h     |  39 ---
>  include/linux/iomap.h  |  59 ++++
>  7 files changed, 722 insertions(+), 669 deletions(-)
.....
> @@ -468,6 +471,8 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>  int
>  iomap_releasepage(struct page *page, gfp_t gfp_mask)
>  {
> +	trace_iomap_releasepage(page->mapping->host, page, 0, 0);
> +
>  	/*
>  	 * mm accommodates an old ext3 case where clean pages might not have had
>  	 * the dirty bit cleared. Thus, it can send actual dirty pages to
> @@ -483,6 +488,8 @@ EXPORT_SYMBOL_GPL(iomap_releasepage);
>  void
>  iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
>  {
> +	trace_iomap_invalidatepage(page->mapping->host, page, offset, len);
> +

These tracepoints should be split out into a separate patch like
the readpage(s) tracepoints. Maybe just lift all the non-writeback
ones in a single patch...

>  	/*
>  	 * If we are invalidating the entire page, clear the dirty state from it
>  	 * and release it to avoid unnecessary buildup of the LRU.
> @@ -1084,3 +1091,558 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
>  	return block_page_mkwrite_return(ret);
>  }
>  EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
> +
> +static void
> +iomap_finish_page_writeback(struct inode *inode, struct bio_vec *bvec,
> +		int error)
> +{
> +	struct iomap_page *iop = to_iomap_page(bvec->bv_page);
> +
> +	if (error) {
> +		SetPageError(bvec->bv_page);
> +		mapping_set_error(inode->i_mapping, -EIO);
> +	}
> +
> +	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
> +
> +	if (!iop || atomic_dec_and_test(&iop->write_count))
> +		end_page_writeback(bvec->bv_page);
> +}

Can we just pass the struct page into this function?

.....

> +/*
> + * Submit the bio for an ioend. We are passed an ioend with a bio attached to
> + * it, and we submit that bio. The ioend may be used for multiple bio
> + * submissions, so we only want to allocate an append transaction for the ioend
> + * once.  In the case of multiple bio submission, each bio will take an IO

This needs to be changed to describe what wpc->ops->submit_ioend()
is used for rather than what XFS might use this hook for.

> + * reference to the ioend to ensure that the ioend completion is only done once
> + * all bios have been submitted and the ioend is really done.
> + *
> + * If @error is non-zero, it means that we have a situation where some part of
> + * the submission process has failed after we have marked paged for writeback
> + * and unlocked them. In this situation, we need to fail the bio and ioend
> + * rather than submit it to IO. This typically only happens on a filesystem
> + * shutdown.
> + */
> +static int
> +iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
> +		int error)
> +{
> +	ioend->io_bio->bi_private = ioend;
> +	ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
> +
> +	if (wpc->ops->submit_ioend)
> +		error = wpc->ops->submit_ioend(ioend, error);

I'm not sure that "submit_ioend" is the best name for this method,
as it is a pre-bio-submission hook, not an actual IO submission
method. "prepare_ioend_for_submit" is more descriptive, but probably
too long. wpc->ops->prepare_submit(ioend, error) reads pretty well,
though...

> +	if (error) {
> +		/*
> +		 * If we are failing the IO now, just mark the ioend with an
> +		 * error and finish it.  This will run IO completion immediately
> +		 * as there is only one reference to the ioend at this point in
> +		 * time.
> +		 */
> +		ioend->io_bio->bi_status = errno_to_blk_status(error);
> +		bio_endio(ioend->io_bio);
> +		return error;
> +	}
> +
> +	submit_bio(ioend->io_bio);
> +	return 0;
> +}

.....
> +/*
> + * We implement an immediate ioend submission policy here to avoid needing to
> + * chain multiple ioends and hence nest mempool allocations which can violate
> + * forward progress guarantees we need to provide. The current ioend we are
> + * adding blocks to is cached on the writepage context, and if the new block

adding pages to ... , and if the new block mapping

> + * does not append to the cached ioend it will create a new ioend and cache that
> + * instead.
> + *
> + * If a new ioend is created and cached, the old ioend is returned and queued
> + * locally for submission once the entire page is processed or an error has been
> + * detected.  While ioends are submitted immediately after they are completed,
> + * batching optimisations are provided by higher level block plugging.
> + *
> + * At the end of a writeback pass, there will be a cached ioend remaining on the
> + * writepage context that the caller will need to submit.
> + */
> +static int
> +iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> +		struct writeback_control *wbc, struct inode *inode,
> +		struct page *page, u64 end_offset)
> +{
> +	struct iomap_page *iop = to_iomap_page(page);
> +	struct iomap_ioend *ioend, *next;
> +	unsigned len = i_blocksize(inode);
> +	u64 file_offset; /* file offset of page */
> +	int error = 0, count = 0, i;
> +	LIST_HEAD(submit_list);
> +
> +	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
> +
> +	/*
> +	 * Walk through the page to find areas to write back. If we run off the
> +	 * end of the current map or find the current map invalid, grab a new
> +	 * one.
> +	 */
> +	for (i = 0, file_offset = page_offset(page);
> +	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
> +	     i++, file_offset += len) {
> +		if (iop && !test_bit(i, iop->uptodate))
> +			continue;
> +
> +		error = wpc->ops->map_blocks(wpc, inode, file_offset);
> +		if (error)
> +			break;
> +		if (wpc->iomap.type == IOMAP_HOLE)
> +			continue;
> +		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
> +				 &submit_list);
> +		count++;
> +	}
> +
> +	WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
> +	WARN_ON_ONCE(!PageLocked(page));
> +	WARN_ON_ONCE(PageWriteback(page));
> +
> +	/*
> +	 * On error, we have to fail the ioend here because we may have set
> +	 * pages under writeback, we have to make sure we run IO completion to
> +	 * mark the error state of the IO appropriately, so we can't cancel the
> +	 * ioend directly here.

Few too many commas and run-ons here. Maybe reword it like this:

	/*
	 * We cannot cancel the ioend directly here if there is a submission
	 * error. We may have already set pages under writeback and hence we
	 * have to run IO completion to mark the error state of the pages under
	 * writeback appropriately.

>
>
>				That means we have to mark this page as under
> +	 * writeback if we included any blocks from it in the ioend chain so
> +	 * that completion treats it correctly.
> +	 *
> +	 * If we didn't include the page in the ioend, the on error we can
                                                       then on error

> +	 * simply discard and unlock it as there are no other users of the page
> +	 * now.  The caller will still need to trigger submission of outstanding
> +	 * ioends on the writepage context so they are treated correctly on
> +	 * error.
> +	 */

.....

> +static int
> +iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
> +{
> +	struct iomap_writepage_ctx *wpc = data;
> +	struct inode *inode = page->mapping->host;
> +	pgoff_t end_index;
> +	u64 end_offset;
> +	loff_t offset;
> +
> +	trace_iomap_writepage(inode, page, 0, 0);
> +
> +	/*
> +	 * Refuse to write the page out if we are called from reclaim context.
> +	 *
> +	 * This avoids stack overflows when called from deeply used stacks in
> +	 * random callers for direct reclaim or memcg reclaim.  We explicitly
> +	 * allow reclaim from kswapd as the stack usage there is relatively low.
> +	 *
> +	 * This should never happen except in the case of a VM regression so
> +	 * warn about it.
> +	 */
> +	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> +			PF_MEMALLOC))
> +		goto redirty;
> +
> +	/*
> +	 * Given that we do not allow direct reclaim to call us, we should
> +	 * never be called while in a filesystem transaction.
> +	 */

	   never be called in a recursive filesystem reclaim context.

> +	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> +		goto redirty;
> +

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2019-10-15 22:07 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15 15:43 lift the xfs writepage code into iomap v7 Christoph Hellwig
2019-10-15 15:43 ` [PATCH 01/12] xfs: initialize iomap->flags in xfs_bmbt_to_iomap Christoph Hellwig
2019-10-15 18:00   ` Darrick J. Wong
2019-10-15 21:20   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 02/12] xfs: set IOMAP_F_NEW more carefully Christoph Hellwig
2019-10-15 18:01   ` Darrick J. Wong
2019-10-15 21:21   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 03/12] xfs: use a struct iomap in xfs_writepage_ctx Christoph Hellwig
2019-10-15 21:23   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 04/12] xfs: refactor the ioend merging code Christoph Hellwig
2019-10-15 17:56   ` Darrick J. Wong
2019-10-15 17:59     ` Christoph Hellwig
2019-10-15 21:26   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 05/12] xfs: turn io_append_trans into an io_private void pointer Christoph Hellwig
2019-10-15 18:02   ` Darrick J. Wong
2019-10-15 21:28   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 06/12] xfs: remove the fork fields in the writepage_ctx and ioend Christoph Hellwig
2019-10-15 18:05   ` Darrick J. Wong
2019-10-15 21:31   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 07/12] iomap: zero newly allocated mapped blocks Christoph Hellwig
2019-10-15 21:32   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 08/12] iomap: lift the xfs readpage / readpages tracing to iomap Christoph Hellwig
2019-10-15 18:06   ` Darrick J. Wong
2019-10-15 21:35   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 09/12] iomap: lift the xfs writeback code " Christoph Hellwig
2019-10-15 18:40   ` Darrick J. Wong
2019-10-16  7:43     ` Christoph Hellwig
2019-10-15 22:07   ` Dave Chinner [this message]
2019-10-16  5:09     ` Darrick J. Wong
2019-10-16  7:48     ` Christoph Hellwig
2019-10-16 12:08       ` Brian Foster
2019-10-16  8:10     ` Christoph Hellwig
2019-10-15 15:43 ` [PATCH 10/12] iomap: warn on inline maps in iomap_writepage_map Christoph Hellwig
2019-10-15 22:08   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 11/12] iomap: move struct iomap_page out of iomap.h Christoph Hellwig
2019-10-15 22:09   ` Dave Chinner
2019-10-15 15:43 ` [PATCH 12/12] iomap: cleanup iomap_ioend_compare Christoph Hellwig
2019-10-15 17:57   ` Darrick J. Wong
2019-10-15 22:10   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191015220721.GC16973@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=agruenba@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox