public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Mike Snitzer <msnitzer@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-nfs@vger.kernel.org, linux-mm@kvack.org,
	 Jeff Layton <jlayton@kernel.org>
Subject: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback
Date: Wed, 01 Apr 2026 15:10:59 -0400	[thread overview]
Message-ID: <20260401-dontcache-v1-2-1f5746fab47a@kernel.org> (raw)
In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org>

When the PAGECACHE_TAG_WRITEBACK tag clears after a round of writeback
completes, all concurrent IOCB_DONTCACHE writers see the tag clear
simultaneously and submit proportional flushes at once — a thundering
herd that causes p99.9 tail latency spikes.

Add an AS_DONTCACHE_FLUSHING flag to the address_space and use
test_and_set_bit() to ensure at most one IOCB_DONTCACHE writer
flushes at a time.  Other writers that find the bit set skip their
flush entirely.  The bit is cleared when the flush completes.

Together with the existing skip-if-busy check on
PAGECACHE_TAG_WRITEBACK (which provides temporal rate limiting by
skipping flushes while prior writeback is still draining), this
creates a two-level guard: the writeback tag paces flush frequency
to match device speed, while the atomic flag prevents the thundering
herd at tag-clear transitions.

Additionally, add a dirty pressure escape hatch: when dirty pages
exceed 75% of the dirty_ratio threshold, bypass the WRITEBACK tag
skip and attempt to flush anyway.  Under heavy multi-writer load,
the skip-if-busy check can cause dirty pages to accumulate (most
writers skip because writeback is always in progress), eventually
triggering balance_dirty_pages() throttling with severe tail latency.
By forcing extra flushes when dirty pressure is high, dontcache
writers help drain dirty pages before the throttle threshold is hit.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c            | 36 +++++++++++++++++++++++++++++-------
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9d9850d37185418349b89e6efe420..e71bf75f6c22d0da5330c17c6e525cb12d254dfe 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
 	AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
 	AS_KERNEL_FILE = 10,	/* mapping for a fake kernel file that shouldn't
 				   account usage to user cgroups */
+	AS_DONTCACHE_FLUSHING = 11, /* dontcache writeback in progress */
 	/* Bits 16-25 are used for FOLIO_ORDER */
 	AS_FOLIO_ORDER_BITS = 5,
 	AS_FOLIO_ORDER_MIN = 16,
diff --git a/mm/filemap.c b/mm/filemap.c
index af2024b736bef74571cc22ab7e3cde2c8e872efe..1b5577bd4eda8ad8ee182e58acd50d99f0a8f9f5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -444,11 +444,21 @@ EXPORT_SYMBOL_GPL(filemap_flush_range);
  * @end:	last byte offset (inclusive) for writeback
  * @nr_written:	number of bytes just written by the caller
  *
- * Rate-limited writeback for IOCB_DONTCACHE writes.  Skips the flush
- * entirely if writeback is already in progress on the mapping (skip-if-busy),
- * and when flushing, caps nr_to_write to the number of pages just written
- * (proportional cap).  Together these avoid writeback contention between
- * concurrent writers and prevent I/O bursts that starve readers.
+ * Rate-limited writeback for IOCB_DONTCACHE writes.  Uses three guards to
+ * avoid writeback contention between concurrent writers:
+ *
+ *  1. Skip-if-busy: if writeback is already in progress on the mapping
+ *     (PAGECACHE_TAG_WRITEBACK set), skip the flush — unless dirty pages
+ *     are approaching the dirty_ratio threshold, in which case flush anyway
+ *     to help drain before balance_dirty_pages() throttles all writers.
+ *
+ *  2. Atomic flush guard: use test_and_set_bit(AS_DONTCACHE_FLUSHING) so
+ *     that at most one dontcache writer flushes at a time, preventing a
+ *     thundering herd when the writeback tag clears and multiple writers
+ *     try to flush simultaneously.
+ *
+ *  3. Proportional cap: cap nr_to_write to the number of pages just written,
+ *     preventing any single flush from starving concurrent readers.
  *
  * Return: %0 on success, negative error code otherwise.
  */
@@ -456,13 +466,25 @@ int filemap_dontcache_writeback_range(struct address_space *mapping,
 		loff_t start, loff_t end, ssize_t nr_written)
 {
 	long nr;
+	int ret;
+
+	if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
+		unsigned long thresh, bg_thresh, dirty;
 
-	if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+		global_dirty_limits(&bg_thresh, &thresh);
+		dirty = global_node_page_state(NR_FILE_DIRTY);
+		if (dirty < thresh * 3 / 4)
+			return 0;
+	}
+
+	if (test_and_set_bit(AS_DONTCACHE_FLUSHING, &mapping->flags))
 		return 0;
 
 	nr = (nr_written + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
+	ret = filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
 			WB_REASON_BACKGROUND);
+	clear_bit(AS_DONTCACHE_FLUSHING, &mapping->flags);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(filemap_dontcache_writeback_range);
 

-- 
2.53.0


  parent reply	other threads:[~2026-04-01 19:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 19:10 [PATCH 0/4] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-01 19:10 ` [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback Jeff Layton
2026-04-02  4:43   ` Ritesh Harjani
2026-04-02 11:59     ` Jeff Layton
2026-04-02 12:40       ` Ritesh Harjani
2026-04-02  5:21   ` Christoph Hellwig
2026-04-02 12:28     ` Jeff Layton
2026-04-06  5:44       ` Christoph Hellwig
2026-04-01 19:10 ` Jeff Layton [this message]
2026-04-02  5:27   ` [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Christoph Hellwig
2026-04-02 12:49     ` Jeff Layton
2026-04-06  5:49       ` Christoph Hellwig
2026-04-06 13:32         ` Jeff Layton
2026-04-07  5:19           ` Christoph Hellwig
2026-04-01 19:11 ` [PATCH 3/4] testing: add nfsd-io-bench NFS server benchmark suite Jeff Layton
2026-04-01 19:11 ` [PATCH 4/4] testing: add dontcache-bench local filesystem " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260401-dontcache-v1-2-1f5746fab47a@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=msnitzer@kernel.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox