From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Jeff Layton <jlayton@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Mike Snitzer <snitzer@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback
Date: Thu, 02 Apr 2026 18:10:32 +0530 [thread overview]
Message-ID: <h5ptmt6n.ritesh.list@gmail.com> (raw)
In-Reply-To: <09672fa10c77d4fbfa1a13ea16aedf79d23fd8f8.camel@kernel.org>
Jeff Layton <jlayton@kernel.org> writes:
> On Thu, 2026-04-02 at 10:13 +0530, Ritesh Harjani wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>>
>> > IOCB_DONTCACHE calls filemap_flush_range() with nr_to_write=LONG_MAX
>> > on every write, which flushes all dirty pages in the written range.
>> > Under concurrent writers this creates severe serialization on the
>> > writeback submission path, causing throughput to collapse to ~47% of
>> > buffered I/O with multi-second tail latency.
>>
>> Yes, between concurrent writers, I agree with the theory.
>>
>>
>> > Even single-client
>> > sequential writes suffer: on a 512GB file with 256GB RAM, the
>> > aggressive flushing triggers dirty throttling that limits throughput
>> > to 575 MB/s vs 1442 MB/s with rate-limited writeback.
>>
>> I am not sure if this 2.5x performance penalty in a "single" sequential
Sorry my bad.. I mis-understood this 2.5x delta at first.
So in a single sequential write case, what this patch is mainly
improving is from unpatched RWF_DONTCACHE (1179 MB/s) to patched
RWF_DONTCACHE (1453 MB/s) = ~23% improvement.
So the below theory which I was talking about was from this delta
perspective i.e. comparing unpatched v/s patched RWF_DONTCACHE mode.
>> writer is due to throttling logic. On giving it some thoughts, I suspect
>> if this is because, the submission side and the completion side both
>> takes the xa_lock and hence could be contending on that.
>>
>> For e.g. since this patch skips doing the flush the second time, (note
>> that writeback is active when the same writer dirtied the page during
>> previous write), this allows the writer to do more work of writing data
>> to page cache pages, instead of waiting on the xa_lock which the
>> completion callback could be holding (folio_end_writeback() -> folio_end_dropbehind())
>>
>> If I see Peak Dirty data from the link you shared [1] in single writer case...
>>
>> Mode MB/s p50 (ms) p99 (ms) p99.9 (ms) Peak Dirty Peak Cache
>> dontcache (unpatched) 1179 3.2 103.3 170.9 14 MB 4.7 GB
>> dontcache (patched) 1453 5.4 43.8 57.4 36 GB 45 GB
>>
>> ... this too shows that the submission side is writing more dirty pages,
>> then the completion side able to write it...
>>
>> I suspect this contention (between submission and completion) could more
>> in IOCB_DONTCACHE case, since the completion side also removes the folio
>> from the page cache within the same xa_lock, which is not the same with
>> normal buffered writes.
>>
>> Maybe a perf callgraph showing the contention would be nicer thing to add
>> here [1] ;).
>>
>> [1]: https://markdownpastebin.com/?id=96249deb897a401ba32acbce05312dcc
>>
>
> That's an interesting point.
>
> The theory I've been operating on is that the flusher thread ends up
> squatting on the xa_lock for a while when memory gets tight, and that
> blocks other readers and writers. Staying ahead of the dirty limits and
> limiting the amount of flush work that each writer does alleviates
> contention for that lock and that's what improves the performance.
>
That's right for comparison between buffered write against RWF_DONTCACHE.
But what I meant in above was for the improvement from 1179 MB/s to 1453
MB/s could be accounted to less contention on xa_lock on patched version
v/s unpatched version for single write sequential testcase.
> You're right though. I'll plan to play around with perf and see if I
> can confirm the theory.
>
Yes, thanks, that will be nice to have!
-ritesh
next prev parent reply other threads:[~2026-04-02 12:52 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 19:10 [PATCH 0/4] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-01 19:10 ` [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback Jeff Layton
2026-04-02 4:43 ` Ritesh Harjani
2026-04-02 11:59 ` Jeff Layton
2026-04-02 12:40 ` Ritesh Harjani [this message]
2026-04-02 5:21 ` Christoph Hellwig
2026-04-02 12:28 ` Jeff Layton
2026-04-06 5:44 ` Christoph Hellwig
2026-04-01 19:10 ` [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Jeff Layton
2026-04-02 5:27 ` Christoph Hellwig
2026-04-02 12:49 ` Jeff Layton
2026-04-06 5:49 ` Christoph Hellwig
2026-04-06 13:32 ` Jeff Layton
2026-04-07 5:19 ` Christoph Hellwig
2026-04-01 19:11 ` [PATCH 3/4] testing: add nfsd-io-bench NFS server benchmark suite Jeff Layton
2026-04-01 19:11 ` [PATCH 4/4] testing: add dontcache-bench local filesystem " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=h5ptmt6n.ritesh.list@gmail.com \
--to=ritesh.list@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=snitzer@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox