From: Jeff Layton <jlayton@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Mike Snitzer <snitzer@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback
Date: Mon, 06 Apr 2026 09:32:58 -0400 [thread overview]
Message-ID: <a84788c9cb25deb928b126fc9368ab8e4e110deb.camel@kernel.org> (raw)
In-Reply-To: <adNJZBXeJomWmhdf@infradead.org>
On Sun, 2026-04-05 at 22:49 -0700, Christoph Hellwig wrote:
> On Thu, Apr 02, 2026 at 08:49:45AM -0400, Jeff Layton wrote:
> > > Have you considered stopping to do in-caller writeback for
> > > IOCB_DONTCACHE vs just leaving it to the writeback daeon?
> > >
> > > Either by totally disabling the writeback and just leaving the
> > > dropbehind bit, or by queuing up wb_writeback_work instances for
> > > the ranges, or by just increasing the pressure for the writeback
> > > daemon. Note that with all schemes including the one in this patch
> > > we might eventually run into writeback scalability limits, which
> > > will require multiple writeback workers.
> >
> > I did test a "dropbehind" mode that just set the dropbehind bit without
> > doing the flush at the end of the write. It was better than stock
> > dontcache but the tail latencies were still pretty bad.
> >
> > I think having each writer do some writeback submission work makes a
> > lot of sense. It helps keep the dirty pages below the dirty thresholds
> > and doesn't seem to tax each writing task _too_ much. The trick is
> > avoiding lock contention while doing it.
>
> Well, an any time you hit a shared resources from multiple threads you
> create that lock contention. Which is why in file system and writeback
> land we've moved away from random user processes hitting both data and
> metadata (e.g. XFS AIL) writeback as it leads to these scalability
> issues. At some point we might run out of steam in a single thread,
> although so far that's mostly been because it does stupid things
> (e.g. writeback on file systems doing complex allocator stuff).
>
> > I think what would be ideal would be to have some (lockless) mechanism
> > to say "there is enough data touched by the range just written to kick
> > off a write that's a suitable size for the backing store". Each writer
> > could check that and then kick off writeback for an approprite range.
>
> And that is called the writeback thread. So what we should do there
> is to make sure we queue up writeback on it for each dontcache write.
> Initially queuing up a wb_writeback_work for each range might be first
> approximation, although we should probably find a way to just increase
> a threshold if going down that road.
>
Ok, I like that idea. I'll give that a shot and see how it does. I'll
note that there is no way to specify an inode or range (yet) in
wb_writeback_work().
Do you think it's sufficient to just call something like
wakeup_flusher_threads_bdi() after every RWF_DONTCACHE write, or should
I extend wb_writeback_work to allow for doing work on a range within a
single inode?
> > I think this even could be beneficial in the normal buffered write
> > codepath too.
>
> Yes, we've had lots of observation that the current 30s timeout is
> actively harmful. Especially on SSDs, but even on HDD just keeping
> the active might make sense.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-04-06 13:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 19:10 [PATCH 0/4] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-01 19:10 ` [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback Jeff Layton
2026-04-02 4:43 ` Ritesh Harjani
2026-04-02 11:59 ` Jeff Layton
2026-04-02 12:40 ` Ritesh Harjani
2026-04-02 5:21 ` Christoph Hellwig
2026-04-02 12:28 ` Jeff Layton
2026-04-06 5:44 ` Christoph Hellwig
2026-04-01 19:10 ` [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Jeff Layton
2026-04-02 5:27 ` Christoph Hellwig
2026-04-02 12:49 ` Jeff Layton
2026-04-06 5:49 ` Christoph Hellwig
2026-04-06 13:32 ` Jeff Layton [this message]
2026-04-01 19:11 ` [PATCH 3/4] testing: add nfsd-io-bench NFS server benchmark suite Jeff Layton
2026-04-01 19:11 ` [PATCH 4/4] testing: add dontcache-bench local filesystem " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a84788c9cb25deb928b126fc9368ab8e4e110deb.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=david@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=snitzer@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox