From: "Theodore Ts'o" <tytso@mit.edu>
To: John Garry <john.g.garry@oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
David Bueso <dave@stgolabs.net>,
lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm <linux-mm@kvack.org>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Matthew Wilcox <willy@infradead.org>,
Dave Chinner <david@fromorbit.com>,
linux-kernel@vger.kernel.org, catherine.hoang@oracle.com
Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes
Date: Sat, 1 Jun 2024 11:33:25 +0200 [thread overview]
Message-ID: <20240601093325.GC247052@mit.edu> (raw)
In-Reply-To: <bf638db9-c4d3-44bd-a92c-d36e3d95adb6@oracle.com>
On Thu, May 23, 2024 at 12:59:57PM +0100, John Garry wrote:
>
> That's my point really. There were some positive discussion. I put across
> the idea of implementing buffered atomic writes, and now I want to ensure
> that everyone is satisfied with that going forward. I think that a LWN
> report is now being written.
I checked in with some PostgreSQL developers after LSF/MM, and
unfortunately, the idea of immediately sending atomic buffered I/O
directly to the storage device is going to be problematic for them.
The problem is that they depend on the database to coalesce writes for
them. So if they are doing a large database commit that involves
touching hundreds or thousands of 16k database pages, they today issue
a separate buffered write request for each database page. So if we
turn each one into an immediate SCSI/NVMe write request, that would be
disastrous for performance. Yes, when they migrate to using Direct
I/O, the database is going to have to figure out how to coalesce write
requests; but this is why it's going to take at least 3 years to make
this migration (and some will call this hopelessly optimistic), and
then users will probably wait another 3 to 5 years before they trust
that the database rewrite to use Direct I/O will get it right and
trust their enterprise workloads to it....
So I think this goes back to either (a) trying to track which writes
we've promised atomic write semantics, or (b) using a completely
different API that only promises "untorn writes with a specified
granulatity" approach for the untorn buffered writes I/O interface,
instead in addition to, or instead of, the current "atomic write"
interface which we are currently trying to promulate for Direct I/O.
Personally, I'd advocate for two separate interfaces; one for "atomic"
I/O's, and a different one for "untorn writes with a specified
guaranteed granularity". And if XFS folks want to turn the atomic I/O
interface into something where you can do a multi-megabyte atomic
write into something that requires allocating new blocks and
atomically mutating the file system metadata to do this kind of
atomicity --- even though the Database folks Don't Care --- God bless.
But let's have something which *just* promises the guarantee requested
by the primary requesteres of this interface, at least for the
buffered I/O case.
Cheers,
- Ted
next prev parent reply other threads:[~2024-06-01 9:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-28 6:12 [LSF/MM/BPF TOPIC] untorn buffered writes Theodore Ts'o
2024-02-28 11:38 ` [Lsf-pc] " Amir Goldstein
2024-02-28 20:21 ` Theodore Ts'o
2024-02-28 14:11 ` Matthew Wilcox
2024-02-28 23:33 ` Theodore Ts'o
2024-02-29 1:07 ` Dave Chinner
2024-02-28 16:06 ` John Garry
2024-02-28 23:24 ` Theodore Ts'o
2024-02-29 16:28 ` John Garry
2024-02-29 21:21 ` Ritesh Harjani
2024-02-29 0:52 ` Dave Chinner
2024-03-11 8:42 ` John Garry
2024-05-15 19:54 ` John Garry
2024-05-22 21:56 ` Luis Chamberlain
2024-05-23 11:59 ` John Garry
2024-06-01 9:33 ` Theodore Ts'o [this message]
2024-06-11 15:23 ` John Garry
2024-05-23 12:59 ` Christoph Hellwig
2024-05-28 9:21 ` John Garry
2024-05-28 10:57 ` Christoph Hellwig
2024-05-28 11:09 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240601093325.GC247052@mit.edu \
--to=tytso@mit.edu \
--cc=catherine.hoang@oracle.com \
--cc=dave@stgolabs.net \
--cc=david@fromorbit.com \
--cc=john.g.garry@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.petersen@oracle.com \
--cc=mcgrof@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).