From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Christoph Hellwig <hch@lst.de>,
"Darrick J . Wong" <djwong@kernel.org>
Subject: Re: [RFC PATCH 00/10] Make O_SYNC writethrough
Date: Thu, 5 May 2022 06:07:11 +0100 [thread overview]
Message-ID: <YnNbf9dPhJ3FiHzH@casper.infradead.org> (raw)
In-Reply-To: <20220505045821.GA1949718@dread.disaster.area>
On Thu, May 05, 2022 at 02:58:21PM +1000, Dave Chinner wrote:
> On Tue, May 03, 2022 at 07:39:58AM +0100, Matthew Wilcox (Oracle) wrote:
> > This is very much in development and basically untested, but Damian
> > started describing to me something that he wanted, and I told him he
> > was asking for the wrong thing, and I already had this patch series
> > in progress. If someone wants to pick it up and make it mergable,
> > that'd be grand.
>
> That've very non-descriptive. Saying "someone wanted something, I said it's
> wrong, so here's a patch series about something else" doesn't tell me anything
> about the problem that Damien was trying to solve.
Sorry about that. I was a bit jet-lagged when I wrote it.
> > The idea is that an O_SYNC write is always going to want to write, and
> > we know that at the time we're storing into the page cache. So for an
> > otherwise clean folio, we can skip the part where we dirty the folio,
> > find the dirty folios and wait for their writeback.
>
> What exactly is this shortcut trying to optimise away? A bit of CPU
> time?
>
> O_SYNC is already a write-through operation - we just call
> filemap_write_and_wait_range() once we've copied the data into the
> page cache and dirtied the page. What does skipping the dirty page
> step gain us?
Two things; the original reason I was doing this, and Damien's reason.
My reason: a small write to a large folio will cause the entire folio to
be dirtied and written. This is unnecessary with O_SYNC; we're about
to force the write anyway; we may as well do the write of the part of
the folio which is modified, and skip the whole dirtying step.
Damien's reason: It's racy. Somebody else (... even vmscan) could cause
folios to be written out of order. This matters for ZoneFS because
writing a file out of order is Not Allowed. He was looking at relaxing
O_DIRECT, but I think what he really wants is a writethrough page cache.
> > We can just mark the
> > folio as writeback-in-progress and start the IO there and then (where we
> > know exactly which blocks need to be written, so possibly a smaller I/O
> > than writing the entire page). The existing "find dirty pages, start
> > I/O and wait on them" code will end up waiting on this pre-started I/O
> > to complete, even though it didn't start any of its own I/O.
> >
> > The important part is patch 9. Everything before it is boring prep work.
> > I'm in two minds about whether to keep the 'write_through' bool, or
> > remove it. So feel to read patches 9+10 squashed together, or as if
> > patch 10 doesn't exist. Whichever feels better.
> >
> > The biggest problem with all this is that iomap doesn't have the necessary
> > information to cause extent allocation, so if you do an O_SYNC write
> > to an extent which is HOLE or DELALLOC, we can't do this optimisation.
> > Maybe that doesn't really matter for interesting applications. I suspect
> > it doesn't matter for ZoneFS.
>
> This seems like a lot of complexity for only partial support. It
> introduces races with page dirtying and cleaning, it likely has
> interesting issues with all the VM dirty/writeback accounting
> (because this series is using a completion path that expects the
> submission path has done it's side of the accounting) and it only
> works in certain preconditions are met.
If we want to have better O_SYNC support, I think we can improve those
conditions. For example, XFS could preallocate the blocks before calling
into iomap. Since it's an O_SYNC write, everything is already terrible.
next prev parent reply other threads:[~2022-05-05 5:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-03 6:39 [RFC PATCH 00/10] Make O_SYNC writethrough Matthew Wilcox (Oracle)
2022-05-03 6:39 ` [RFC PATCH 01/10] iomap: Pass struct iomap to iomap_alloc_ioend() Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 02/10] iomap: Remove iomap_writepage_ctx from iomap_can_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 03/10] iomap: Do not pass iomap_writepage_ctx to iomap_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 04/10] iomap: Accept a NULL iomap_writepage_ctx in iomap_submit_ioend() Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 05/10] iomap: Allow a NULL writeback_control argument to iomap_alloc_ioend() Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 06/10] iomap: Pass a length to iomap_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 07/10] iomap: Reorder functions Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 08/10] " Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 09/10] iomap: Add writethrough for O_SYNC Matthew Wilcox (Oracle)
2022-05-03 6:40 ` [RFC PATCH 10/10] remove write_through bool Matthew Wilcox (Oracle)
2022-05-03 12:57 ` [RFC PATCH 00/10] Make O_SYNC writethrough Damien Le Moal
2022-05-05 4:58 ` Dave Chinner
2022-05-05 5:07 ` Matthew Wilcox [this message]
2022-05-05 7:05 ` Dave Chinner
2022-05-06 12:03 ` Damien Le Moal
2022-05-10 1:26 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YnNbf9dPhJ3FiHzH@casper.infradead.org \
--to=willy@infradead.org \
--cc=damien.lemoal@opensource.wdc.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).