public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Cc: hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org,
	willy@infradead.org, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 10/15] mm/filemap: make buffered writes work with RWF_UNCACHED
Date: Sun, 10 Nov 2024 08:28:02 -0700	[thread overview]
Message-ID: <20241110152906.1747545-11-axboe@kernel.dk> (raw)
In-Reply-To: <20241110152906.1747545-1-axboe@kernel.dk>

If RWF_UNCACHED is set for a write, mark new folios being written with
uncached. This is done by passing in the fact that it's an uncached write
through the folio pointer. We can only get there when IOCB_UNCACHED was
allowed, which can only happen if the file system opts in. Opting in means
they need to check for the LSB in the folio pointer to know if it's an
uncached write or not. If it is, then FGP_UNCACHED should be used if
creating new folios is necessary.

Uncached writes will drop any folios they create upon writeback
completion, but leave folios that may exist in that range alone. Since
->write_begin() doesn't currently take any flags, and to avoid needing
to change the callback kernel wide, use the foliop being passed in to
->write_begin() to signal if this is an uncached write or not. File
systems can then use that to mark newly created folios as uncached.

Add a helper, generic_uncached_write(), that generic_file_write_iter()
calls upon successful completion of an uncached write.

This provides similar benefits to using RWF_UNCACHED with reads. Testing
buffered writes on 32 files:

writing bs 65536, uncached 0
  1s: 196035MB/sec, MB=196035
  2s: 132308MB/sec, MB=328147
  3s: 132438MB/sec, MB=460586
  4s: 116528MB/sec, MB=577115
  5s: 103898MB/sec, MB=681014
  6s: 108893MB/sec, MB=789907
  7s: 99678MB/sec, MB=889586
  8s: 106545MB/sec, MB=996132
  9s: 106826MB/sec, MB=1102958
 10s: 101544MB/sec, MB=1204503
 11s: 111044MB/sec, MB=1315548
 12s: 124257MB/sec, MB=1441121
 13s: 116031MB/sec, MB=1557153
 14s: 114540MB/sec, MB=1671694
 15s: 115011MB/sec, MB=1786705
 16s: 115260MB/sec, MB=1901966
 17s: 116068MB/sec, MB=2018034
 18s: 116096MB/sec, MB=2134131

where it's quite obvious where the page cache filled, and performance
dropped from to about half of where it started, settling in at around
115GB/sec. Meanwhile, 32 kswapds were running full steam trying to
reclaim pages.

Running the same test with uncached buffered writes:

writing bs 65536, uncached 1
  1s: 198974MB/sec
  2s: 189618MB/sec
  3s: 193601MB/sec
  4s: 188582MB/sec
  5s: 193487MB/sec
  6s: 188341MB/sec
  7s: 194325MB/sec
  8s: 188114MB/sec
  9s: 192740MB/sec
 10s: 189206MB/sec
 11s: 193442MB/sec
 12s: 189659MB/sec
 13s: 191732MB/sec
 14s: 190701MB/sec
 15s: 191789MB/sec
 16s: 191259MB/sec
 17s: 190613MB/sec
 18s: 191951MB/sec

and the behavior is fully predictable, performing the same throughout
even after the page cache would otherwise have fully filled with dirty
data. It's also about 65% faster, and using half the CPU of the system
compared to the normal buffered write.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/pagemap.h | 29 +++++++++++++++++++++++++++++
 mm/filemap.c            | 26 +++++++++++++++++++++++---
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0122b3fbe2ac..5469664f66c3 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
+#include <linux/writeback.h>
 #include <linux/hugetlb_inline.h>
 
 struct folio_batch;
@@ -70,6 +71,34 @@ static inline int filemap_write_and_wait(struct address_space *mapping)
 	return filemap_write_and_wait_range(mapping, 0, LLONG_MAX);
 }
 
+/*
+ * generic_uncached_write - start uncached writeback
+ * @iocb: the iocb that was written
+ * @written: the amount of bytes written
+ *
+ * When writeback has been handled by write_iter, this helper should be called
+ * if the file system supports uncached writes. If %IOCB_UNCACHED is set, it
+ * will kick off writeback for the specified range.
+ */
+static inline void generic_uncached_write(struct kiocb *iocb, ssize_t written)
+{
+	if (iocb->ki_flags & IOCB_UNCACHED) {
+		struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+		/* kick off uncached writeback */
+		__filemap_fdatawrite_range(mapping, iocb->ki_pos,
+					   iocb->ki_pos + written, WB_SYNC_NONE);
+	}
+}
+
+/*
+ * Value passed in to ->write_begin() if IOCB_UNCACHED is set for the write,
+ * and the ->write_begin() handler on a file system supporting FOP_UNCACHED
+ * must check for this and pass FGP_UNCACHED for folio creation.
+ */
+#define foliop_uncached			((struct folio *) 0xfee1c001)
+#define foliop_is_uncached(foliop)	(*(foliop) == foliop_uncached)
+
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
  * @mapping: mapping in which to set writeback error
diff --git a/mm/filemap.c b/mm/filemap.c
index efd02b047541..cfbfc8b14b1f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -430,6 +430,7 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start,
 
 	return filemap_fdatawrite_wbc(mapping, &wbc);
 }
+EXPORT_SYMBOL_GPL(__filemap_fdatawrite_range);
 
 static inline int __filemap_fdatawrite(struct address_space *mapping,
 	int sync_mode)
@@ -1609,7 +1610,14 @@ static void folio_end_uncached(struct folio *folio)
 {
 	bool reset = true;
 
-	if (folio_trylock(folio)) {
+	/*
+	 * Hitting !in_task() should not happen off RWF_UNCACHED writeback, but
+	 * can happen if normal writeback just happens to find dirty folios
+	 * that were created as part of uncached writeback, and that writeback
+	 * would otherwise not need non-IRQ handling. Just skip the
+	 * invalidation in that case.
+	 */
+	if (in_task() && folio_trylock(folio)) {
 		reset = !invalidate_complete_folio2(folio->mapping, folio, 0);
 		folio_unlock(folio);
 	}
@@ -4061,7 +4069,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 	ssize_t written = 0;
 
 	do {
-		struct folio *folio;
+		struct folio *folio = NULL;
 		size_t offset;		/* Offset into folio */
 		size_t bytes;		/* Bytes to write to folio */
 		size_t copied;		/* Bytes copied from user */
@@ -4089,6 +4097,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 			break;
 		}
 
+		/*
+		 * If IOCB_UNCACHED is set here, we now the file system
+		 * supports it. And hence it'll know to check folip for being
+		 * set to this magic value. If so, it's an uncached write.
+		 * Whenever ->write_begin() changes prototypes again, this
+		 * can go away and just pass iocb or iocb flags.
+		 */
+		if (iocb->ki_flags & IOCB_UNCACHED)
+			folio = foliop_uncached;
+
 		status = a_ops->write_begin(file, mapping, pos, bytes,
 						&folio, &fsdata);
 		if (unlikely(status < 0))
@@ -4219,8 +4237,10 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		ret = __generic_file_write_iter(iocb, from);
 	inode_unlock(inode);
 
-	if (ret > 0)
+	if (ret > 0) {
+		generic_uncached_write(iocb, ret);
 		ret = generic_write_sync(iocb, ret);
+	}
 	return ret;
 }
 EXPORT_SYMBOL(generic_file_write_iter);
-- 
2.45.2



  parent reply	other threads:[~2024-11-10 15:29 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-10 15:27 [PATCHSET v2 0/15] Uncached buffered IO Jens Axboe
2024-11-10 15:27 ` [PATCH 01/15] mm/filemap: change filemap_create_folio() to take a struct kiocb Jens Axboe
2024-11-10 15:27 ` [PATCH 02/15] mm/readahead: add folio allocation helper Jens Axboe
2024-11-10 15:27 ` [PATCH 03/15] mm: add PG_uncached page flag Jens Axboe
2024-11-10 15:27 ` [PATCH 04/15] mm/readahead: add readahead_control->uncached member Jens Axboe
2024-11-10 15:27 ` [PATCH 05/15] mm/filemap: use page_cache_sync_ra() to kick off read-ahead Jens Axboe
2024-11-10 15:27 ` [PATCH 06/15] mm/truncate: make invalidate_complete_folio2() public Jens Axboe
2024-11-10 15:27 ` [PATCH 07/15] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag Jens Axboe
2024-11-10 15:28 ` [PATCH 08/15] mm/filemap: add read support for RWF_UNCACHED Jens Axboe
2024-11-11  9:15   ` Kirill A. Shutemov
2024-11-11 14:12     ` Jens Axboe
2024-11-11 15:16       ` Christoph Hellwig
2024-11-11 15:17         ` Jens Axboe
2024-11-11 17:09           ` Jens Axboe
2024-11-11 23:42             ` Jens Axboe
2024-11-12  5:13               ` Christoph Hellwig
2024-11-12 15:14                 ` Jens Axboe
2024-11-12 16:39                   ` Brian Foster
2024-11-12 17:06                     ` Jens Axboe
2024-11-12 17:19                       ` Jens Axboe
2024-11-12 18:44                         ` Brian Foster
2024-11-12 19:08                           ` Jens Axboe
2024-11-12 19:39                             ` Brian Foster
2024-11-12 19:45                               ` Jens Axboe
2024-11-12 20:21                                 ` Brian Foster
2024-11-12 20:25                                   ` Jens Axboe
2024-11-13 14:07                                     ` Jens Axboe
2024-11-11 15:25       ` Kirill A. Shutemov
2024-11-11 15:31         ` Jens Axboe
2024-11-11 15:51           ` Kirill A. Shutemov
2024-11-11 15:57             ` Jens Axboe
2024-11-11 16:29               ` Kirill A. Shutemov
2024-11-10 15:28 ` [PATCH 09/15] mm/filemap: drop uncached pages when writeback completes Jens Axboe
2024-11-11  9:17   ` Kirill A. Shutemov
2024-11-10 15:28 ` Jens Axboe [this message]
2024-11-10 15:28 ` [PATCH 11/15] mm: add FGP_UNCACHED folio creation flag Jens Axboe
2024-11-10 15:28 ` [PATCH 12/15] ext4: add RWF_UNCACHED write support Jens Axboe
2024-11-10 15:28 ` [PATCH 13/15] iomap: make buffered writes work with RWF_UNCACHED Jens Axboe
2024-11-10 15:28 ` [PATCH 14/15] xfs: punt uncached write completions to the completion wq Jens Axboe
2024-11-10 15:28 ` [PATCH 15/15] xfs: flag as supporting FOP_UNCACHED Jens Axboe
2024-11-11 15:27   ` Christoph Hellwig
2024-11-11 15:33     ` Jens Axboe
2024-11-11 17:25 ` [PATCHSET v2 0/15] Uncached buffered IO Matthew Wilcox
2024-11-11 17:39   ` Jens Axboe
2024-11-11 21:24   ` Yu Zhao
2024-11-11 21:48     ` Matthew Wilcox
2024-11-11 22:07       ` Yu Zhao
2024-11-20 23:11         ` Yuanchu Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241110152906.1747545-11-axboe@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=clm@meta.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox