linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Chris Mason <clm@fb.com>
Cc: djwong@kernel.org, hch@infradead.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org,
	dchinner@redhat.com
Subject: Re: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()
Date: Wed, 1 Jun 2022 05:18:49 -0700	[thread overview]
Message-ID: <YpdZKbrtXJJ9mWL7@infradead.org> (raw)
In-Reply-To: <20220601011116.495988-1-clm@fb.com>

This does look sane to me.  How much testing did this get?  Especially
for the block size < page sie case?  Also adding Dave as he has spent
a lot of time on this code.

On Tue, May 31, 2022 at 06:11:17PM -0700, Chris Mason wrote:
> iomap_do_writepage() sends pages past i_size through
> folio_redirty_for_writepage(), which normally isn't a problem because
> truncate and friends clean them very quickly.
> 
> When the system a variety of cgroups,

^^^ This sentence does not parse ^^^

> we can end up in situations where
> one cgroup has almost no dirty pages at all.  This is especially common
> in our XFS workloads in production because they tend to use O_DIRECT for
> everything.
> 
> We've hit storms where the redirty path hits millions of times in a few
> seconds, on all a single file that's only ~40 pages long.  This ends up
> leading to long tail latencies for file writes because the page reclaim
> workers are hogging the CPU from some kworkers bound to the same CPU.
> 
> That's the theory anyway.  We know the storms exist, but the tail
> latencies are so sporadic that it's hard to have any certainty about the
> cause without patching a large number of boxes.
> 
> There are a few different problems here.  First is just that I don't
> understand how invalidating the page instead of redirtying might upset
> the rest of the iomap/xfs world.  Btrfs invalidates in this case, which
> seems like the right thing to me, but we all have slightly different
> sharp corners in the truncate path so I thought I'd ask for comments.
> 
> Second is the VM should take wbc->pages_skipped into account, or use
> some other way to avoid looping over and over.  I think we actually want
> both but I wanted to understand the page invalidation path first.
> 
> Signed-off-by: Chris Mason <clm@fb.com>
> Reported-by: Domas Mituzas <domas@fb.com>
> ---
>  fs/iomap/buffered-io.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8ce8720093b9..4a687a2a9ed9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1482,10 +1482,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
>  		pgoff_t end_index = isize >> PAGE_SHIFT;
>  
>  		/*
> -		 * Skip the page if it's fully outside i_size, e.g. due to a
> -		 * truncate operation that's in progress. We must redirty the
> -		 * page so that reclaim stops reclaiming it. Otherwise
> -		 * iomap_vm_releasepage() is called on it and gets confused.
> +		 * invalidate the page if it's fully outside i_size, e.g.
> +		 * due to a truncate operation that's in progress.
>  		 *
>  		 * Note that the end_index is unsigned long.  If the given
>  		 * offset is greater than 16TB on a 32-bit system then if we
> @@ -1499,8 +1497,10 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
>  		 * offset is just equal to the EOF.
>  		 */
>  		if (folio->index > end_index ||
> -		    (folio->index == end_index && poff == 0))
> -			goto redirty;
> +		    (folio->index == end_index && poff == 0)) {
> +			folio_invalidate(folio, 0, folio_size(folio));
> +			goto unlock;
> +		}
>  
>  		/*
>  		 * The page straddles i_size.  It must be zeroed out on each
> @@ -1518,6 +1518,7 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
>  
>  redirty:
>  	folio_redirty_for_writepage(wbc, folio);
> +unlock:
>  	folio_unlock(folio);
>  	return 0;
>  }
> -- 
> 2.30.2
> 
---end quoted text---

  reply	other threads:[~2022-06-01 12:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01  1:11 [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage() Chris Mason
2022-06-01 12:18 ` Christoph Hellwig [this message]
2022-06-01 14:13   ` Chris Mason
2022-06-02  6:52     ` Dave Chinner
2022-06-02 15:32       ` Johannes Weiner
2022-06-02 19:41         ` Chris Mason
2022-06-02 19:59           ` Matthew Wilcox
2022-06-02 22:07             ` Dave Chinner
2022-06-02 22:06         ` Dave Chinner
2022-06-03  1:29           ` Chris Mason
2022-06-03  5:20             ` Dave Chinner
2022-06-03 15:06               ` Johannes Weiner
2022-06-03 16:09                 ` Chris Mason
2022-06-05 23:32                   ` Dave Chinner
2022-06-06 14:46                     ` Johannes Weiner
2022-06-06 15:13                       ` Chris Mason
2022-06-07 22:52                       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpdZKbrtXJJ9mWL7@infradead.org \
    --to=hch@infradead.org \
    --cc=clm@fb.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).