linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>, "Darrick J . Wong" <djwong@kernel.org>
Subject: Re: [RFC PATCH 00/10] Make O_SYNC writethrough
Date: Tue, 3 May 2022 21:57:10 +0900	[thread overview]
Message-ID: <f43fdbda-ba95-8864-3fd5-e9251e0de7a6@opensource.wdc.com> (raw)
In-Reply-To: <20220503064008.3682332-1-willy@infradead.org>

On 2022/05/03 15:39, Matthew Wilcox (Oracle) wrote:
> This is very much in development and basically untested, but Damian

s/Damian/Damien :)

Thank you for posting this. I am definitely going to play with this with zonefs.

The goal is to allow replacing the mandatory O_DIRECT writing of sequential zone
files with sector aligned O_SYNC writes which "preload" the page cache for
subsequent buffered reads, thus reducing device accesses. That will also avoid
an annoying overhead with zonefs which is that applications need 2 file
descriptors per zone file: one without O_DIRECT for buffered reads and another
O_DIRECT one for writes.

In the case of zonefs, since all sequential files are always fully mapped,
allocated, cannot be used for mmap writing *and* a write is never an overwrite,
these conditions:

+	if (folio_test_dirty(folio))
+		return true;

+	/* Can't allocate blocks here because we don't have ->prepare_ioend */
+	if (iomap->type != IOMAP_MAPPED || iomap->type != IOMAP_UNWRITTEN ||
+	    iomap->flags & IOMAP_F_SHARED)
+		return false;

never trigger and the writethrough is always started with
folio_start_writeback(), essentially becoming a "direct" write from the issuer
context (under the inode lock) on the entire folio. And that should guarantee
that writes stay sequential as they must.

> started describing to me something that he wanted, and I told him he
> was asking for the wrong thing, and I already had this patch series
> in progress.  If someone wants to pick it up and make it mergable,
> that'd be grand.
> 
> The idea is that an O_SYNC write is always going to want to write, and
> we know that at the time we're storing into the page cache.  So for an
> otherwise clean folio, we can skip the part where we dirty the folio,
> find the dirty folios and wait for their writeback.  We can just mark the
> folio as writeback-in-progress and start the IO there and then (where we
> know exactly which blocks need to be written, so possibly a smaller I/O
> than writing the entire page).  The existing "find dirty pages, start
> I/O and wait on them" code will end up waiting on this pre-started I/O
> to complete, even though it didn't start any of its own I/O.
> 
> The important part is patch 9.  Everything before it is boring prep work.
> I'm in two minds about whether to keep the 'write_through' bool, or
> remove it.  So feel to read patches 9+10 squashed together, or as if
> patch 10 doesn't exist.  Whichever feels better.
> 
> The biggest problem with all this is that iomap doesn't have the necessary
> information to cause extent allocation, so if you do an O_SYNC write
> to an extent which is HOLE or DELALLOC, we can't do this optimisation.
> Maybe that doesn't really matter for interesting applications.  I suspect
> it doesn't matter for ZoneFS.
> 
> Matthew Wilcox (Oracle) (10):
>   iomap: Pass struct iomap to iomap_alloc_ioend()
>   iomap: Remove iomap_writepage_ctx from iomap_can_add_to_ioend()
>   iomap: Do not pass iomap_writepage_ctx to iomap_add_to_ioend()
>   iomap: Accept a NULL iomap_writepage_ctx in iomap_submit_ioend()
>   iomap: Allow a NULL writeback_control argument to iomap_alloc_ioend()
>   iomap: Pass a length to iomap_add_to_ioend()
>   iomap: Reorder functions
>   iomap: Reorder functions
>   iomap: Add writethrough for O_SYNC
>   remove write_through bool
> 
>  fs/iomap/buffered-io.c | 492 +++++++++++++++++++++++------------------
>  1 file changed, 273 insertions(+), 219 deletions(-)
> 


-- 
Damien Le Moal
Western Digital Research

  parent reply	other threads:[~2022-05-03 12:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03  6:39 [RFC PATCH 00/10] Make O_SYNC writethrough Matthew Wilcox (Oracle)
2022-05-03  6:39 ` [RFC PATCH 01/10] iomap: Pass struct iomap to iomap_alloc_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 02/10] iomap: Remove iomap_writepage_ctx from iomap_can_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 03/10] iomap: Do not pass iomap_writepage_ctx to iomap_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 04/10] iomap: Accept a NULL iomap_writepage_ctx in iomap_submit_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 05/10] iomap: Allow a NULL writeback_control argument to iomap_alloc_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 06/10] iomap: Pass a length to iomap_add_to_ioend() Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 07/10] iomap: Reorder functions Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 08/10] " Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 09/10] iomap: Add writethrough for O_SYNC Matthew Wilcox (Oracle)
2022-05-03  6:40 ` [RFC PATCH 10/10] remove write_through bool Matthew Wilcox (Oracle)
2022-05-03 12:57 ` Damien Le Moal [this message]
2022-05-05  4:58 ` [RFC PATCH 00/10] Make O_SYNC writethrough Dave Chinner
2022-05-05  5:07   ` Matthew Wilcox
2022-05-05  7:05     ` Dave Chinner
2022-05-06 12:03       ` Damien Le Moal
2022-05-10  1:26         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f43fdbda-ba95-8864-3fd5-e9251e0de7a6@opensource.wdc.com \
    --to=damien.lemoal@opensource.wdc.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).