From: Jeff Layton <jlayton@kernel.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Mike Snitzer <msnitzer@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-mm@kvack.org,
Jeff Layton <jlayton@kernel.org>
Subject: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback
Date: Wed, 01 Apr 2026 15:10:59 -0400 [thread overview]
Message-ID: <20260401-dontcache-v1-2-1f5746fab47a@kernel.org> (raw)
In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org>
When the PAGECACHE_TAG_WRITEBACK tag clears after a round of writeback
completes, all concurrent IOCB_DONTCACHE writers see the tag clear
simultaneously and submit proportional flushes at once — a thundering
herd that causes p99.9 tail latency spikes.
Add an AS_DONTCACHE_FLUSHING flag to the address_space and use
test_and_set_bit() to ensure at most one IOCB_DONTCACHE writer
flushes at a time. Other writers that find the bit set skip their
flush entirely. The bit is cleared when the flush completes.
Together with the existing skip-if-busy check on
PAGECACHE_TAG_WRITEBACK (which provides temporal rate limiting by
skipping flushes while prior writeback is still draining), this
creates a two-level guard: the writeback tag paces flush frequency
to match device speed, while the atomic flag prevents the thundering
herd at tag-clear transitions.
Additionally, add a dirty pressure escape hatch: when dirty pages
exceed 75% of the dirty_ratio threshold, bypass the WRITEBACK tag
skip and attempt to flush anyway. Under heavy multi-writer load,
the skip-if-busy check can cause dirty pages to accumulate (most
writers skip because writeback is always in progress), eventually
triggering balance_dirty_pages() throttling with severe tail latency.
By forcing extra flushes when dirty pressure is high, dontcache
writers help drain dirty pages before the throttle threshold is hit.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 36 +++++++++++++++++++++++++++++-------
2 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9d9850d37185418349b89e6efe420..e71bf75f6c22d0da5330c17c6e525cb12d254dfe 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
account usage to user cgroups */
+ AS_DONTCACHE_FLUSHING = 11, /* dontcache writeback in progress */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
diff --git a/mm/filemap.c b/mm/filemap.c
index af2024b736bef74571cc22ab7e3cde2c8e872efe..1b5577bd4eda8ad8ee182e58acd50d99f0a8f9f5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -444,11 +444,21 @@ EXPORT_SYMBOL_GPL(filemap_flush_range);
* @end: last byte offset (inclusive) for writeback
* @nr_written: number of bytes just written by the caller
*
- * Rate-limited writeback for IOCB_DONTCACHE writes. Skips the flush
- * entirely if writeback is already in progress on the mapping (skip-if-busy),
- * and when flushing, caps nr_to_write to the number of pages just written
- * (proportional cap). Together these avoid writeback contention between
- * concurrent writers and prevent I/O bursts that starve readers.
+ * Rate-limited writeback for IOCB_DONTCACHE writes. Uses three guards to
+ * avoid writeback contention between concurrent writers:
+ *
+ * 1. Skip-if-busy: if writeback is already in progress on the mapping
+ * (PAGECACHE_TAG_WRITEBACK set), skip the flush — unless dirty pages
+ * are approaching the dirty_ratio threshold, in which case flush anyway
+ * to help drain before balance_dirty_pages() throttles all writers.
+ *
+ * 2. Atomic flush guard: use test_and_set_bit(AS_DONTCACHE_FLUSHING) so
+ * that at most one dontcache writer flushes at a time, preventing a
+ * thundering herd when the writeback tag clears and multiple writers
+ * try to flush simultaneously.
+ *
+ * 3. Proportional cap: cap nr_to_write to the number of pages just written,
+ * preventing any single flush from starving concurrent readers.
*
* Return: %0 on success, negative error code otherwise.
*/
@@ -456,13 +466,25 @@ int filemap_dontcache_writeback_range(struct address_space *mapping,
loff_t start, loff_t end, ssize_t nr_written)
{
long nr;
+ int ret;
+
+ if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
+ unsigned long thresh, bg_thresh, dirty;
- if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ global_dirty_limits(&bg_thresh, &thresh);
+ dirty = global_node_page_state(NR_FILE_DIRTY);
+ if (dirty < thresh * 3 / 4)
+ return 0;
+ }
+
+ if (test_and_set_bit(AS_DONTCACHE_FLUSHING, &mapping->flags))
return 0;
nr = (nr_written + PAGE_SIZE - 1) >> PAGE_SHIFT;
- return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
+ ret = filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
WB_REASON_BACKGROUND);
+ clear_bit(AS_DONTCACHE_FLUSHING, &mapping->flags);
+ return ret;
}
EXPORT_SYMBOL_GPL(filemap_dontcache_writeback_range);
--
2.53.0
next prev parent reply other threads:[~2026-04-01 19:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 19:10 [PATCH 0/4] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-01 19:10 ` [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback Jeff Layton
2026-04-02 4:43 ` Ritesh Harjani
2026-04-02 11:59 ` Jeff Layton
2026-04-02 12:40 ` Ritesh Harjani
2026-04-02 5:21 ` Christoph Hellwig
2026-04-02 12:28 ` Jeff Layton
2026-04-06 5:44 ` Christoph Hellwig
2026-04-01 19:10 ` Jeff Layton [this message]
2026-04-02 5:27 ` [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Christoph Hellwig
2026-04-02 12:49 ` Jeff Layton
2026-04-06 5:49 ` Christoph Hellwig
2026-04-06 13:32 ` Jeff Layton
2026-04-01 19:11 ` [PATCH 3/4] testing: add nfsd-io-bench NFS server benchmark suite Jeff Layton
2026-04-01 19:11 ` [PATCH 4/4] testing: add dontcache-bench local filesystem " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260401-dontcache-v1-2-1f5746fab47a@kernel.org \
--to=jlayton@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=msnitzer@kernel.org \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox