From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: Jan Kara <jack@suse.cz>, Andres Freund <andres@anarazel.de>
Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, lsf-pc@lists.linux-foundation.org,
djwong@kernel.org, john.g.garry@oracle.com, willy@infradead.org,
hch@lst.de, ritesh.list@gmail.com,
Luis Chamberlain <mcgrof@kernel.org>,
dchinner@redhat.com, Javier Gonzalez <javier.gonz@samsung.com>,
gost.dev@samsung.com, tytso@mit.edu, p.raghav@samsung.com,
vi.shah@samsung.com
Subject: Re: [LSF/MM/BPF TOPIC] Buffered atomic writes
Date: Tue, 17 Feb 2026 13:42:35 +0100 [thread overview]
Message-ID: <4627056f-2ab9-4ff1-bca0-5d80f8f0bbab@linux.dev> (raw)
In-Reply-To: <wkczfczlmstoywbmgfrxzm6ko4frjsu65kvpwquzu7obrjcd3f@6gs5nsfivc6v>
On 2/17/2026 1:06 PM, Jan Kara wrote:
> On Mon 16-02-26 10:45:40, Andres Freund wrote:
>>> Hmm, IIUC, postgres will write their dirty buffer cache by combining
>>> multiple DB pages based on `io_combine_limit` (typically 128kb).
>>
>> We will try to do that, but it's obviously far from always possible, in some
>> workloads [parts of ]the data in the buffer pool rarely will be dirtied in
>> consecutive blocks.
>>
>> FWIW, postgres already tries to force some just-written pages into
>> writeback. For sources of writes that can be plentiful and are done in the
>> background, we default to issuing sync_file_range(SYNC_FILE_RANGE_WRITE),
>> after 256kB-512kB of writes, as otherwise foreground latency can be
>> significantly impacted by the kernel deciding to suddenly write back (due to
>> dirty_writeback_centisecs, dirty_background_bytes, ...) and because otherwise
>> the fsyncs at the end of a checkpoint can be unpredictably slow. For
>> foreground writes we do not default to that, as there are users that won't
>> (because they don't know, because they overcommit hardware, ...) size
>> postgres' buffer pool to be big enough and thus will often re-dirty pages that
>> have already recently been written out to the operating systems. But for many
>> workloads it's recommened that users turn on
>> sync_file_range(SYNC_FILE_RANGE_WRITE) for foreground writes as well (*).
>>
>> So for many workloads it'd be fine to just always start writeback for atomic
>> writes immediately. It's possible, but I am not at all sure, that for most of
>> the other workloads, the gains from atomic writes will outstrip the cost of
>> more frequently writing data back.
>
> OK, good. Then I think it's worth a try.
>
>> (*) As it turns out, it often seems to improves write throughput as well, if
>> writeback is triggered by memory pressure instead of SYNC_FILE_RANGE_WRITE,
>> linux seems to often trigger a lot more small random IO.
>>
>>> So immediately writing them might be ok as long as we don't remove those
>>> pages from the page cache like we do in RWF_UNCACHED.
>>
>> Yes, it might. I actually often have wished for something like a
>> RWF_WRITEBACK flag...
>
> I'd call it RWF_WRITETHROUGH but otherwise it makes sense.
>
One naive question: semantically what will be the difference between
RWF_DSYNC and RWF_WRITETHROUGH? So RWF_DSYNC will be the sync version and
RWF_WRITETHOUGH will be an async version where we kick off writeback immediately
in the background and return?
--
Pankaj
next prev parent reply other threads:[~2026-02-17 12:42 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 10:20 [LSF/MM/BPF TOPIC] Buffered atomic writes Pankaj Raghav
2026-02-13 13:32 ` Ojaswin Mujoo
2026-02-16 9:52 ` Pankaj Raghav
2026-02-16 15:45 ` Andres Freund
2026-02-17 12:06 ` Jan Kara
2026-02-17 12:42 ` Pankaj Raghav [this message]
2026-02-17 16:21 ` Andres Freund
2026-02-18 1:04 ` Dave Chinner
2026-02-18 6:47 ` Christoph Hellwig
2026-02-18 23:42 ` Dave Chinner
2026-02-17 16:13 ` Andres Freund
2026-02-17 18:27 ` Ojaswin Mujoo
2026-02-17 18:42 ` Andres Freund
2026-02-18 17:37 ` Jan Kara
2026-02-18 21:04 ` Andres Freund
2026-02-19 0:32 ` Dave Chinner
2026-02-17 18:33 ` Ojaswin Mujoo
2026-02-17 17:20 ` Ojaswin Mujoo
2026-02-18 17:42 ` [Lsf-pc] " Jan Kara
2026-02-18 20:22 ` Ojaswin Mujoo
2026-02-16 11:38 ` Jan Kara
2026-02-16 13:18 ` Pankaj Raghav
2026-02-17 18:36 ` Ojaswin Mujoo
2026-02-16 15:57 ` Andres Freund
2026-02-17 18:39 ` Ojaswin Mujoo
2026-02-18 0:26 ` Dave Chinner
2026-02-18 6:49 ` Christoph Hellwig
2026-02-18 12:54 ` Ojaswin Mujoo
2026-02-15 9:01 ` Amir Goldstein
2026-02-17 5:51 ` Christoph Hellwig
2026-02-17 9:23 ` [Lsf-pc] " Amir Goldstein
2026-02-17 15:47 ` Andres Freund
2026-02-17 22:45 ` Dave Chinner
2026-02-18 4:10 ` Andres Freund
2026-02-18 6:53 ` Christoph Hellwig
2026-02-18 6:51 ` Christoph Hellwig
2026-02-20 10:08 ` Pankaj Raghav (Samsung)
2026-02-20 15:10 ` Christoph Hellwig
2026-02-24 13:09 ` Pankaj Raghav (Samsung)
2026-02-24 15:04 ` Christoph Hellwig
2026-04-24 1:02 ` Ritesh Harjani
2026-04-24 4:42 ` Matthew Wilcox
2026-04-24 4:50 ` Ritesh Harjani
2026-04-24 6:57 ` Amir Goldstein
2026-04-24 9:40 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4627056f-2ab9-4ff1-bca0-5d80f8f0bbab@linux.dev \
--to=pankaj.raghav@linux.dev \
--cc=andres@anarazel.de \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=javier.gonz@samsung.com \
--cc=john.g.garry@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=p.raghav@samsung.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=vi.shah@samsung.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.