From: Christoph Hellwig <hch@infradead.org>
To: Chris Mason <clm@fb.com>
Cc: djwong@kernel.org, hch@infradead.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org,
dchinner@redhat.com
Subject: Re: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()
Date: Wed, 1 Jun 2022 05:18:49 -0700 [thread overview]
Message-ID: <YpdZKbrtXJJ9mWL7@infradead.org> (raw)
In-Reply-To: <20220601011116.495988-1-clm@fb.com>
This does look sane to me. How much testing did this get? Especially
for the block size < page sie case? Also adding Dave as he has spent
a lot of time on this code.
On Tue, May 31, 2022 at 06:11:17PM -0700, Chris Mason wrote:
> iomap_do_writepage() sends pages past i_size through
> folio_redirty_for_writepage(), which normally isn't a problem because
> truncate and friends clean them very quickly.
>
> When the system a variety of cgroups,
^^^ This sentence does not parse ^^^
> we can end up in situations where
> one cgroup has almost no dirty pages at all. This is especially common
> in our XFS workloads in production because they tend to use O_DIRECT for
> everything.
>
> We've hit storms where the redirty path hits millions of times in a few
> seconds, on all a single file that's only ~40 pages long. This ends up
> leading to long tail latencies for file writes because the page reclaim
> workers are hogging the CPU from some kworkers bound to the same CPU.
>
> That's the theory anyway. We know the storms exist, but the tail
> latencies are so sporadic that it's hard to have any certainty about the
> cause without patching a large number of boxes.
>
> There are a few different problems here. First is just that I don't
> understand how invalidating the page instead of redirtying might upset
> the rest of the iomap/xfs world. Btrfs invalidates in this case, which
> seems like the right thing to me, but we all have slightly different
> sharp corners in the truncate path so I thought I'd ask for comments.
>
> Second is the VM should take wbc->pages_skipped into account, or use
> some other way to avoid looping over and over. I think we actually want
> both but I wanted to understand the page invalidation path first.
>
> Signed-off-by: Chris Mason <clm@fb.com>
> Reported-by: Domas Mituzas <domas@fb.com>
> ---
> fs/iomap/buffered-io.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8ce8720093b9..4a687a2a9ed9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1482,10 +1482,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
> pgoff_t end_index = isize >> PAGE_SHIFT;
>
> /*
> - * Skip the page if it's fully outside i_size, e.g. due to a
> - * truncate operation that's in progress. We must redirty the
> - * page so that reclaim stops reclaiming it. Otherwise
> - * iomap_vm_releasepage() is called on it and gets confused.
> + * invalidate the page if it's fully outside i_size, e.g.
> + * due to a truncate operation that's in progress.
> *
> * Note that the end_index is unsigned long. If the given
> * offset is greater than 16TB on a 32-bit system then if we
> @@ -1499,8 +1497,10 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
> * offset is just equal to the EOF.
> */
> if (folio->index > end_index ||
> - (folio->index == end_index && poff == 0))
> - goto redirty;
> + (folio->index == end_index && poff == 0)) {
> + folio_invalidate(folio, 0, folio_size(folio));
> + goto unlock;
> + }
>
> /*
> * The page straddles i_size. It must be zeroed out on each
> @@ -1518,6 +1518,7 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
>
> redirty:
> folio_redirty_for_writepage(wbc, folio);
> +unlock:
> folio_unlock(folio);
> return 0;
> }
> --
> 2.30.2
>
---end quoted text---
next prev parent reply other threads:[~2022-06-01 12:18 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-01 1:11 [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage() Chris Mason
2022-06-01 12:18 ` Christoph Hellwig [this message]
2022-06-01 14:13 ` Chris Mason
2022-06-02 6:52 ` Dave Chinner
2022-06-02 15:32 ` Johannes Weiner
2022-06-02 19:41 ` Chris Mason
2022-06-02 19:59 ` Matthew Wilcox
2022-06-02 22:07 ` Dave Chinner
2022-06-02 22:06 ` Dave Chinner
2022-06-03 1:29 ` Chris Mason
2022-06-03 5:20 ` Dave Chinner
2022-06-03 15:06 ` Johannes Weiner
2022-06-03 16:09 ` Chris Mason
2022-06-05 23:32 ` Dave Chinner
2022-06-06 14:46 ` Johannes Weiner
2022-06-06 15:13 ` Chris Mason
2022-06-07 22:52 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YpdZKbrtXJJ9mWL7@infradead.org \
--to=hch@infradead.org \
--cc=clm@fb.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).