From: "Darrick J. Wong" <djwong@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org,
willy@infradead.org, kirill@shutemov.name, bfoster@redhat.com
Subject: Re: [PATCH 11/12] mm/filemap: make buffered writes work with RWF_UNCACHED
Date: Fri, 6 Dec 2024 09:17:40 -0800 [thread overview]
Message-ID: <20241206171740.GD7820@frogsfrogsfrogs> (raw)
In-Reply-To: <20241203153232.92224-13-axboe@kernel.dk>
On Tue, Dec 03, 2024 at 08:31:47AM -0700, Jens Axboe wrote:
> If RWF_UNCACHED is set for a write, mark new folios being written with
> uncached. This is done by passing in the fact that it's an uncached write
> through the folio pointer. We can only get there when IOCB_UNCACHED was
> allowed, which can only happen if the file system opts in. Opting in means
> they need to check for the LSB in the folio pointer to know if it's an
> uncached write or not. If it is, then FGP_UNCACHED should be used if
> creating new folios is necessary.
>
> Uncached writes will drop any folios they create upon writeback
> completion, but leave folios that may exist in that range alone. Since
> ->write_begin() doesn't currently take any flags, and to avoid needing
> to change the callback kernel wide, use the foliop being passed in to
> ->write_begin() to signal if this is an uncached write or not. File
> systems can then use that to mark newly created folios as uncached.
>
> This provides similar benefits to using RWF_UNCACHED with reads. Testing
> buffered writes on 32 files:
>
> writing bs 65536, uncached 0
> 1s: 196035MB/sec
> 2s: 132308MB/sec
> 3s: 132438MB/sec
> 4s: 116528MB/sec
> 5s: 103898MB/sec
> 6s: 108893MB/sec
> 7s: 99678MB/sec
> 8s: 106545MB/sec
> 9s: 106826MB/sec
> 10s: 101544MB/sec
> 11s: 111044MB/sec
> 12s: 124257MB/sec
> 13s: 116031MB/sec
> 14s: 114540MB/sec
> 15s: 115011MB/sec
> 16s: 115260MB/sec
> 17s: 116068MB/sec
> 18s: 116096MB/sec
>
> where it's quite obvious where the page cache filled, and performance
> dropped from to about half of where it started, settling in at around
> 115GB/sec. Meanwhile, 32 kswapds were running full steam trying to
> reclaim pages.
>
> Running the same test with uncached buffered writes:
>
> writing bs 65536, uncached 1
> 1s: 198974MB/sec
> 2s: 189618MB/sec
> 3s: 193601MB/sec
> 4s: 188582MB/sec
> 5s: 193487MB/sec
> 6s: 188341MB/sec
> 7s: 194325MB/sec
> 8s: 188114MB/sec
> 9s: 192740MB/sec
> 10s: 189206MB/sec
> 11s: 193442MB/sec
> 12s: 189659MB/sec
> 13s: 191732MB/sec
> 14s: 190701MB/sec
> 15s: 191789MB/sec
> 16s: 191259MB/sec
> 17s: 190613MB/sec
> 18s: 191951MB/sec
>
> and the behavior is fully predictable, performing the same throughout
> even after the page cache would otherwise have fully filled with dirty
> data. It's also about 65% faster, and using half the CPU of the system
> compared to the normal buffered write.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> include/linux/fs.h | 5 +++++
> include/linux/pagemap.h | 9 +++++++++
> mm/filemap.c | 12 +++++++++++-
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 40383f5cc6a2..32255473f79d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2912,6 +2912,11 @@ static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
> (iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
> if (ret)
> return ret;
> + } else if (iocb->ki_flags & IOCB_UNCACHED) {
> + struct address_space *mapping = iocb->ki_filp->f_mapping;
> +
> + filemap_fdatawrite_range_kick(mapping, iocb->ki_pos,
> + iocb->ki_pos + count);
> }
>
> return count;
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f2d49dccb7c1..e49587c40157 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -14,6 +14,7 @@
> #include <linux/gfp.h>
> #include <linux/bitops.h>
> #include <linux/hardirq.h> /* for in_interrupt() */
> +#include <linux/writeback.h>
> #include <linux/hugetlb_inline.h>
>
> struct folio_batch;
> @@ -70,6 +71,14 @@ static inline int filemap_write_and_wait(struct address_space *mapping)
> return filemap_write_and_wait_range(mapping, 0, LLONG_MAX);
> }
>
> +/*
> + * Value passed in to ->write_begin() if IOCB_UNCACHED is set for the write,
> + * and the ->write_begin() handler on a file system supporting FOP_UNCACHED
> + * must check for this and pass FGP_UNCACHED for folio creation.
> + */
> +#define foliop_uncached ((struct folio *) 0xfee1c001)
> +#define foliop_is_uncached(foliop) (*(foliop) == foliop_uncached)
Honestly, I'm not a fan of foliop_uncached or foliop_is_uncached.
The first one because it's a magic value and can you guarantee that
0xfee1c001 will never be a pointer to an actual struct folio, even on
32-bit?
Second, they're both named "foliop" even though the first one doesn't
return a (struct folio **) but the second one takes that as an arg.
I think these two macros are only used for ext4 (or really, !iomap)
support, right? And that's only to avoid messing with ->write_begin?
What if you dropped ext4 support instead? :D
--D
> /**
> * filemap_set_wb_err - set a writeback error on an address_space
> * @mapping: mapping in which to set writeback error
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 826df99e294f..00f3c6c58629 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4095,7 +4095,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> ssize_t written = 0;
>
> do {
> - struct folio *folio;
> + struct folio *folio = NULL;
> size_t offset; /* Offset into folio */
> size_t bytes; /* Bytes to write to folio */
> size_t copied; /* Bytes copied from user */
> @@ -4123,6 +4123,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> break;
> }
>
> + /*
> + * If IOCB_UNCACHED is set here, we now the file system
> + * supports it. And hence it'll know to check folip for being
> + * set to this magic value. If so, it's an uncached write.
> + * Whenever ->write_begin() changes prototypes again, this
> + * can go away and just pass iocb or iocb flags.
> + */
> + if (iocb->ki_flags & IOCB_UNCACHED)
> + folio = foliop_uncached;
> +
> status = a_ops->write_begin(file, mapping, pos, bytes,
> &folio, &fsdata);
> if (unlikely(status < 0))
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2024-12-06 17:17 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-03 15:31 [PATCHSET v6 0/12] Uncached buffered IO Jens Axboe
2024-12-03 15:31 ` [PATCH 01/12] mm/filemap: change filemap_create_folio() to take a struct kiocb Jens Axboe
2024-12-10 11:13 ` Christoph Hellwig
2024-12-12 15:49 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 02/12] mm/readahead: add folio allocation helper Jens Axboe
2024-12-03 15:31 ` [PATCH 03/12] mm: add PG_uncached page flag Jens Axboe
2024-12-03 15:31 ` [PATCH 04/12] mm/readahead: add readahead_control->uncached member Jens Axboe
2024-12-03 15:31 ` [PATCH 05/12] mm/filemap: use page_cache_sync_ra() to kick off read-ahead Jens Axboe
2024-12-10 11:15 ` Christoph Hellwig
2024-12-03 15:31 ` [PATCH 06/12] mm/truncate: add folio_unmap_invalidate() helper Jens Axboe
2024-12-10 11:21 ` Christoph Hellwig
2024-12-12 20:19 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 07/12] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag Jens Axboe
2024-12-06 17:35 ` Darrick J. Wong
2024-12-10 11:22 ` Christoph Hellwig
2024-12-12 19:42 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 08/12] mm/filemap: add read support for RWF_UNCACHED Jens Axboe
2024-12-03 15:31 ` [PATCH 09/12] mm/filemap: drop uncached pages when writeback completes Jens Axboe
2024-12-03 15:31 ` [PATCH 10/12] mm/filemap: add filemap_fdatawrite_range_kick() helper Jens Axboe
2024-12-03 15:31 ` [PATCH 11/12] mm/filemap: make buffered writes work with RWF_UNCACHED Jens Axboe
2024-12-06 17:17 ` Darrick J. Wong [this message]
2024-12-06 18:22 ` Jens Axboe
2024-12-10 11:31 ` Christoph Hellwig
2024-12-12 15:51 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 12/12] mm: add FGP_UNCACHED folio creation flag Jens Axboe
2024-12-03 18:23 ` [PATCHSET v6 0/12] Uncached buffered IO Christoph Lameter (Ampere)
2024-12-03 21:06 ` Jens Axboe
2024-12-03 22:16 ` Christoph Lameter (Ampere)
2024-12-03 22:41 ` Jens Axboe
2024-12-04 5:52 ` Darrick J. Wong
2024-12-04 16:36 ` Jens Axboe
2024-12-10 11:11 ` Christoph Hellwig
2024-12-12 15:48 ` Jens Axboe
2024-12-12 16:59 ` Christoph Lameter (Ampere)
2024-12-12 19:14 ` Jens Axboe
2024-12-12 19:35 ` Matthew Wilcox
2024-12-12 19:36 ` Jens Axboe
2024-12-12 20:06 ` Christoph Lameter (Ampere)
2024-12-13 5:04 ` Johannes Weiner
2024-12-13 14:49 ` Jens Axboe
2024-12-06 17:37 ` Darrick J. Wong
2024-12-10 9:48 ` Bharata B Rao
2024-12-12 15:46 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241206171740.GD7820@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=axboe@kernel.dk \
--cc=bfoster@redhat.com \
--cc=clm@meta.com \
--cc=hannes@cmpxchg.org \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.