From: "Theodore Ts'o" <tytso@mit.edu>
To: Matthew Wilcox <willy@infradead.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm <linux-mm@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes
Date: Wed, 28 Feb 2024 17:33:54 -0600 [thread overview]
Message-ID: <20240228233354.GC177082@mit.edu> (raw)
In-Reply-To: <Zd8--pYHdnjefncj@casper.infradead.org>
On Wed, Feb 28, 2024 at 02:11:06PM +0000, Matthew Wilcox wrote:
> I'm not entirely sure that it does become a mess. If our implementation
> of this ensures that each write ends up in a single folio (even if the
> entire folio is larger than the write), then we will have satisfied the
> semantics of the flag.
What if we do a 32k write which spans two folios? And what
if the physical pages for those 32k in the buffer cache are not
contiguous? Are you going to have to join the two 16k folios
together, or maybe two 8k folios and an 16k folio, and relocate pages
to make a contiguous 32k folio when we do a buffered RWF_ATOMIC write
of size 32k?
Folios have to consist of physically contiguous pages, right? But we
can do send a single 32k write request using scatter-gather even if
the pages are not physically contiguous. So it would seem to me that
trying to overload the folio size to represent the "atomic write
guarantee" of RWF_ATOMIC seems unwise.
(And yes, the database might not need it to be 32k untorn write, but
what if it sends a 32k write, for example because it's writing a set
of pages to the database journal file? The RWF_ATOMIC interface
doesn't *know* what is really required, the only thing it knows is the
overly strong guarantees that we set in the definition of that
interface. Or are we going to make the RWF_ATOMIC interface fail all
writes that aren't exactly 16k? That seems.... baroque.)
> I think we'd be better off treating RWF_ATOMIC like it's a bs>PS device.
> That takes two somewhat special cases and makes them use the same code
> paths, which probably means fewer bugs as both camps will be testing
> the same code.
But for a bs > PS device, where the logical block size is greater than
the page size, you don't need the RWF_ATOMIC flag at all. All direct
I/O writes *must* be a multiple of the logical sector size, and
buffered writes, if they are smaller than the block size, *must* be
handled as a read-modify-write, since you can't send writes to the
device smaller than the logical sector size.
This is why I claim that LBS devices and untorn writes are largely
orthogonal; for LBS devices no special API is needed at all, and
certainly not the highly problematic RWF_ATOMIC API that has been
proposed. (Well, not problematic for Direct I/O, which is what we had
originally focused upon, but highly problematic for buffered I/O.)
- Ted
next prev parent reply other threads:[~2024-02-28 23:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-28 6:12 [LSF/MM/BPF TOPIC] untorn buffered writes Theodore Ts'o
2024-02-28 11:38 ` [Lsf-pc] " Amir Goldstein
2024-02-28 20:21 ` Theodore Ts'o
2024-02-28 14:11 ` Matthew Wilcox
2024-02-28 23:33 ` Theodore Ts'o [this message]
2024-02-29 1:07 ` Dave Chinner
2024-02-28 16:06 ` John Garry
2024-02-28 23:24 ` Theodore Ts'o
2024-02-29 16:28 ` John Garry
2024-02-29 21:21 ` Ritesh Harjani
2024-02-29 0:52 ` Dave Chinner
2024-03-11 8:42 ` John Garry
2024-05-15 19:54 ` John Garry
2024-05-22 21:56 ` Luis Chamberlain
2024-05-23 11:59 ` John Garry
2024-06-01 9:33 ` Theodore Ts'o
2024-06-11 15:23 ` John Garry
2024-05-23 12:59 ` Christoph Hellwig
2024-05-28 9:21 ` John Garry
2024-05-28 10:57 ` Christoph Hellwig
2024-05-28 11:09 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240228233354.GC177082@mit.edu \
--to=tytso@mit.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).