linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J . Wong" <djwong@kernel.org>
Subject: Re: [RFC PATCH 00/10] Make O_SYNC writethrough
Date: Thu, 5 May 2022 06:07:11 +0100	[thread overview]
Message-ID: <YnNbf9dPhJ3FiHzH@casper.infradead.org> (raw)
In-Reply-To: <20220505045821.GA1949718@dread.disaster.area>

On Thu, May 05, 2022 at 02:58:21PM +1000, Dave Chinner wrote:
> On Tue, May 03, 2022 at 07:39:58AM +0100, Matthew Wilcox (Oracle) wrote:
> > This is very much in development and basically untested, but Damian
> > started describing to me something that he wanted, and I told him he
> > was asking for the wrong thing, and I already had this patch series
> > in progress.  If someone wants to pick it up and make it mergable,
> > that'd be grand.
> 
> That've very non-descriptive. Saying "someone wanted something, I said it's
> wrong, so here's a patch series about something else" doesn't tell me anything
> about the problem that Damien was trying to solve.

Sorry about that.  I was a bit jet-lagged when I wrote it.

> > The idea is that an O_SYNC write is always going to want to write, and
> > we know that at the time we're storing into the page cache.  So for an
> > otherwise clean folio, we can skip the part where we dirty the folio,
> > find the dirty folios and wait for their writeback.
> 
> What exactly is this shortcut trying to optimise away? A bit of CPU
> time?
> 
> O_SYNC is already a write-through operation - we just call
> filemap_write_and_wait_range() once we've copied the data into the
> page cache and dirtied the page. What does skipping the dirty page
> step gain us?

Two things; the original reason I was doing this, and Damien's reason.

My reason: a small write to a large folio will cause the entire folio to
be dirtied and written.  This is unnecessary with O_SYNC; we're about
to force the write anyway; we may as well do the write of the part of
the folio which is modified, and skip the whole dirtying step.

Damien's reason: It's racy.  Somebody else (... even vmscan) could cause
folios to be written out of order.  This matters for ZoneFS because
writing a file out of order is Not Allowed.  He was looking at relaxing
O_DIRECT, but I think what he really wants is a writethrough page cache.

> > We can just mark the
> > folio as writeback-in-progress and start the IO there and then (where we
> > know exactly which blocks need to be written, so possibly a smaller I/O
> > than writing the entire page).  The existing "find dirty pages, start
> > I/O and wait on them" code will end up waiting on this pre-started I/O
> > to complete, even though it didn't start any of its own I/O.
> > 
> > The important part is patch 9.  Everything before it is boring prep work.
> > I'm in two minds about whether to keep the 'write_through' bool, or
> > remove it.  So feel to read patches 9+10 squashed together, or as if
> > patch 10 doesn't exist.  Whichever feels better.
> > 
> > The biggest problem with all this is that iomap doesn't have the necessary
> > information to cause extent allocation, so if you do an O_SYNC write
> > to an extent which is HOLE or DELALLOC, we can't do this optimisation.
> > Maybe that doesn't really matter for interesting applications.  I suspect
> > it doesn't matter for ZoneFS.
> 
> This seems like a lot of complexity for only partial support. It
> introduces races with page dirtying and cleaning, it likely has
> interesting issues with all the VM dirty/writeback accounting
> (because this series is using a completion path that expects the
> submission path has done it's side of the accounting) and it only
> works in certain preconditions are met.

If we want to have better O_SYNC support, I think we can improve those
conditions.  For example, XFS could preallocate the blocks before calling
into iomap.  Since it's an O_SYNC write, everything is already terrible.

  reply	other threads:[~2022-05-05  5:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03  6:39 [RFC PATCH 00/10] Make O_SYNC writethrough Matthew Wilcox (Oracle)
2022-05-03  6:39 ` [RFC PATCH 01/10] iomap: Pass struct iomap to iomap_alloc_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 02/10] iomap: Remove iomap_writepage_ctx from iomap_can_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 03/10] iomap: Do not pass iomap_writepage_ctx to iomap_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 04/10] iomap: Accept a NULL iomap_writepage_ctx in iomap_submit_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 05/10] iomap: Allow a NULL writeback_control argument to iomap_alloc_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 06/10] iomap: Pass a length to iomap_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 07/10] iomap: Reorder functions Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 08/10] " Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 09/10] iomap: Add writethrough for O_SYNC Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 10/10] remove write_through bool Matthew Wilcox (Oracle)
2022-05-03 12:57 ` [RFC PATCH 00/10] Make O_SYNC writethrough Damien Le Moal
2022-05-05  4:58 ` Dave Chinner
2022-05-05  5:07   ` Matthew Wilcox [this message]
2022-05-05  7:05     ` Dave Chinner
2022-05-06 12:03       ` Damien Le Moal
2022-05-10  1:26         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnNbf9dPhJ3FiHzH@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).