From: Jeff Layton <jlayton@kernel.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Mike Snitzer <msnitzer@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-mm@kvack.org,
Jeff Layton <jlayton@kernel.org>
Subject: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback
Date: Wed, 01 Apr 2026 15:10:59 -0400 [thread overview]
Message-ID: <20260401-dontcache-v1-2-1f5746fab47a@kernel.org> (raw)
In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org>
When the PAGECACHE_TAG_WRITEBACK tag clears after a round of writeback
completes, all concurrent IOCB_DONTCACHE writers see the tag clear
simultaneously and submit proportional flushes at once — a thundering
herd that causes p99.9 tail latency spikes.
Add an AS_DONTCACHE_FLUSHING flag to the address_space and use
test_and_set_bit() to ensure at most one IOCB_DONTCACHE writer
flushes at a time. Other writers that find the bit set skip their
flush entirely. The bit is cleared when the flush completes.
Together with the existing skip-if-busy check on
PAGECACHE_TAG_WRITEBACK (which provides temporal rate limiting by
skipping flushes while prior writeback is still draining), this
creates a two-level guard: the writeback tag paces flush frequency
to match device speed, while the atomic flag prevents the thundering
herd at tag-clear transitions.
Additionally, add a dirty pressure escape hatch: when dirty pages
exceed 75% of the dirty_ratio threshold, bypass the WRITEBACK tag
skip and attempt to flush anyway. Under heavy multi-writer load,
the skip-if-busy check can cause dirty pages to accumulate (most
writers skip because writeback is always in progress), eventually
triggering balance_dirty_pages() throttling with severe tail latency.
By forcing extra flushes when dirty pressure is high, dontcache
writers help drain dirty pages before the throttle threshold is hit.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 36 +++++++++++++++++++++++++++++-------
2 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9d9850d37185418349b89e6efe420..e71bf75f6c22d0da5330c17c6e525cb12d254dfe 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
account usage to user cgroups */
+ AS_DONTCACHE_FLUSHING = 11, /* dontcache writeback in progress */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
diff --git a/mm/filemap.c b/mm/filemap.c
index af2024b736bef74571cc22ab7e3cde2c8e872efe..1b5577bd4eda8ad8ee182e58acd50d99f0a8f9f5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -444,11 +444,21 @@ EXPORT_SYMBOL_GPL(filemap_flush_range);
* @end: last byte offset (inclusive) for writeback
* @nr_written: number of bytes just written by the caller
*
- * Rate-limited writeback for IOCB_DONTCACHE writes. Skips the flush
- * entirely if writeback is already in progress on the mapping (skip-if-busy),
- * and when flushing, caps nr_to_write to the number of pages just written
- * (proportional cap). Together these avoid writeback contention between
- * concurrent writers and prevent I/O bursts that starve readers.
+ * Rate-limited writeback for IOCB_DONTCACHE writes. Uses three guards to
+ * avoid writeback contention between concurrent writers:
+ *
+ * 1. Skip-if-busy: if writeback is already in progress on the mapping
+ * (PAGECACHE_TAG_WRITEBACK set), skip the flush — unless dirty pages
+ * are approaching the dirty_ratio threshold, in which case flush anyway
+ * to help drain before balance_dirty_pages() throttles all writers.
+ *
+ * 2. Atomic flush guard: use test_and_set_bit(AS_DONTCACHE_FLUSHING) so
+ * that at most one dontcache writer flushes at a time, preventing a
+ * thundering herd when the writeback tag clears and multiple writers
+ * try to flush simultaneously.
+ *
+ * 3. Proportional cap: cap nr_to_write to the number of pages just written,
+ * preventing any single flush from starving concurrent readers.
*
* Return: %0 on success, negative error code otherwise.
*/
@@ -456,13 +466,25 @@ int filemap_dontcache_writeback_range(struct address_space *mapping,
loff_t start, loff_t end, ssize_t nr_written)
{
long nr;
+ int ret;
+
+ if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
+ unsigned long thresh, bg_thresh, dirty;
- if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ global_dirty_limits(&bg_thresh, &thresh);
+ dirty = global_node_page_state(NR_FILE_DIRTY);
+ if (dirty < thresh * 3 / 4)
+ return 0;
+ }
+
+ if (test_and_set_bit(AS_DONTCACHE_FLUSHING, &mapping->flags))
return 0;
nr = (nr_written + PAGE_SIZE - 1) >> PAGE_SHIFT;
- return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
+ ret = filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
WB_REASON_BACKGROUND);
+ clear_bit(AS_DONTCACHE_FLUSHING, &mapping->flags);
+ return ret;
}
EXPORT_SYMBOL_GPL(filemap_dontcache_writeback_range);
--
2.53.0
next prev parent reply other threads:[~2026-04-01 19:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 19:10 [PATCH 0/4] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-01 19:10 ` [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback Jeff Layton
2026-04-02 4:43 ` Ritesh Harjani
2026-04-02 11:59 ` Jeff Layton
2026-04-02 12:40 ` Ritesh Harjani
2026-04-02 5:21 ` Christoph Hellwig
2026-04-02 12:28 ` Jeff Layton
2026-04-06 5:44 ` Christoph Hellwig
2026-04-01 19:10 ` Jeff Layton [this message]
2026-04-02 5:27 ` [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Christoph Hellwig
2026-04-02 12:49 ` Jeff Layton
2026-04-06 5:49 ` Christoph Hellwig
2026-04-06 13:32 ` Jeff Layton
2026-04-07 5:19 ` Christoph Hellwig
2026-04-01 19:11 ` [PATCH 3/4] testing: add nfsd-io-bench NFS server benchmark suite Jeff Layton
2026-04-01 19:11 ` [PATCH 4/4] testing: add dontcache-bench local filesystem " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260401-dontcache-v1-2-1f5746fab47a@kernel.org \
--to=jlayton@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=msnitzer@kernel.org \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.