From: Martin Steigerwald <martin@lichtvoll.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org, willy@infradead.org, clm@fb.com,
torvalds@linux-foundation.org, david@fromorbit.com
Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED
Date: Thu, 12 Dec 2019 11:44:59 +0100 [thread overview]
Message-ID: <63049728.ylUViGSH3C@merkaba> (raw)
In-Reply-To: <20191211152943.2933-1-axboe@kernel.dk>
Hi Jens.
Jens Axboe - 11.12.19, 16:29:38 CET:
> Recently someone asked me how io_uring buffered IO compares to mmaped
> IO in terms of performance. So I ran some tests with buffered IO, and
> found the experience to be somewhat painful. The test case is pretty
> basic, random reads over a dataset that's 10x the size of RAM.
> Performance starts out fine, and then the page cache fills up and we
> hit a throughput cliff. CPU usage of the IO threads go up, and we have
> kswapd spending 100% of a core trying to keep up. Seeing that, I was
> reminded of the many complaints I here about buffered IO, and the
> fact that most of the folks complaining will ultimately bite the
> bullet and move to O_DIRECT to just get the kernel out of the way.
>
> But I don't think it needs to be like that. Switching to O_DIRECT
> isn't always easily doable. The buffers have different life times,
> size and alignment constraints, etc. On top of that, mixing buffered
> and O_DIRECT can be painful.
>
> Seems to me that we have an opportunity to provide something that sits
> somewhere in between buffered and O_DIRECT, and this is where
> RWF_UNCACHED enters the picture. If this flag is set on IO, we get
> the following behavior:
>
> - If the data is in cache, it remains in cache and the copy (in or
> out) is served to/from that.
>
> - If the data is NOT in cache, we add it while performing the IO. When
> the IO is done, we remove it again.
>
> With this, I can do 100% smooth buffered reads or writes without
> pushing the kernel to the state where kswapd is sweating bullets. In
> fact it doesn't even register.
A question from a user or Linux Performance trainer perspective:
How does this compare with posix_fadvise() with POSIX_FADV_DONTNEED that
for example the nocache¹ command is using? Excerpt from manpage
posix_fadvice(2):
POSIX_FADV_DONTNEED
The specified data will not be accessed in the near
future.
POSIX_FADV_DONTNEED attempts to free cached pages as‐
sociated with the specified region. This is useful,
for example, while streaming large files. A program
may periodically request the kernel to free cached
data that has already been used, so that more useful
cached pages are not discarded instead.
[1] packaged in Debian as nocache or available herehttps://github.com/
Feh/nocache
In any way, would be nice to have some option in rsync… I still did not
change my backup script to call rsync via nocache.
Thanks,
Martin
> Comments appreciated! This should work on any standard file system,
> using either the generic helpers or iomap. I have tested ext4 and xfs
> for the right read/write behavior, but no further validation has been
> done yet. Patches are against current git, and can also be found here:
>
> https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached
>
> fs/ceph/file.c | 2 +-
> fs/dax.c | 2 +-
> fs/ext4/file.c | 2 +-
> fs/iomap/apply.c | 26 ++++++++++-
> fs/iomap/buffered-io.c | 54 ++++++++++++++++-------
> fs/iomap/direct-io.c | 3 +-
> fs/iomap/fiemap.c | 5 ++-
> fs/iomap/seek.c | 6 ++-
> fs/iomap/swapfile.c | 2 +-
> fs/nfs/file.c | 2 +-
> include/linux/fs.h | 7 ++-
> include/linux/iomap.h | 10 ++++-
> include/uapi/linux/fs.h | 5 ++-
> mm/filemap.c | 95
> ++++++++++++++++++++++++++++++++++++----- 14 files changed, 181
> insertions(+), 40 deletions(-)
>
> Changes since v2:
> - Rework the write side according to Chinners suggestions. Much
> cleaner this way. It does mean that we invalidate the full write
> region if just ONE page (or more) had to be created, where before it
> was more granular. I don't think that's a concern, and on the plus
> side, we now no longer have to chunk invalidations into 15/16 pages
> at the time.
> - Cleanups
>
> Changes since v1:
> - Switch to pagevecs for write_drop_cached_pages()
> - Use page_offset() instead of manual shift
> - Ensure we hold a reference on the page between calling ->write_end()
> and checking the mapping on the locked page
> - Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag
> after the first page
--
Martin
next prev parent reply other threads:[~2019-12-12 10:55 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-11 15:29 [PATCHSET v3 0/5] Support for RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-11 15:29 ` [PATCH 2/5] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-11 15:29 ` [PATCH 3/5] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 4/5] iomap: pass in the write_begin/write_end flags to iomap_actor Jens Axboe
2019-12-11 17:19 ` Linus Torvalds
2019-12-11 15:29 ` [PATCH 5/5] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-11 17:19 ` Matthew Wilcox
2019-12-11 18:05 ` Jens Axboe
2019-12-12 22:34 ` Dave Chinner
2019-12-13 0:54 ` Jens Axboe
2019-12-13 0:57 ` Jens Axboe
2019-12-16 4:17 ` Dave Chinner
2019-12-17 14:31 ` Jens Axboe
2019-12-18 0:49 ` Dave Chinner
2019-12-18 1:01 ` Jens Axboe
2019-12-11 17:37 ` [PATCHSET v3 0/5] Support for RWF_UNCACHED Linus Torvalds
2019-12-11 17:56 ` Jens Axboe
2019-12-11 19:14 ` Linus Torvalds
2019-12-11 19:34 ` Jens Axboe
2019-12-11 20:03 ` Linus Torvalds
2019-12-11 20:08 ` Jens Axboe
2019-12-11 20:18 ` Linus Torvalds
2019-12-11 21:04 ` Johannes Weiner
2019-12-12 1:30 ` Jens Axboe
2019-12-11 23:41 ` Jens Axboe
2019-12-12 1:08 ` Linus Torvalds
2019-12-12 1:11 ` Jens Axboe
2019-12-12 1:22 ` Linus Torvalds
2019-12-12 1:29 ` Jens Axboe
2019-12-12 1:41 ` Linus Torvalds
2019-12-12 1:56 ` Matthew Wilcox
2019-12-12 2:47 ` Linus Torvalds
2019-12-12 17:52 ` Matthew Wilcox
2019-12-12 18:29 ` Linus Torvalds
2019-12-12 20:05 ` Matthew Wilcox
2019-12-12 1:41 ` Jens Axboe
2019-12-12 1:49 ` Linus Torvalds
2019-12-12 1:09 ` Jens Axboe
2019-12-12 2:03 ` Jens Axboe
2019-12-12 2:10 ` Jens Axboe
2019-12-12 2:21 ` Matthew Wilcox
2019-12-12 2:38 ` Jens Axboe
2019-12-12 22:18 ` Dave Chinner
2019-12-13 1:32 ` Chris Mason
2020-01-07 17:42 ` Christoph Hellwig
2020-01-08 14:09 ` Chris Mason
2020-02-01 10:33 ` Andres Freund
2019-12-11 20:43 ` Matthew Wilcox
2019-12-11 20:04 ` Jens Axboe
2019-12-12 10:44 ` Martin Steigerwald [this message]
2019-12-12 15:16 ` Jens Axboe
2019-12-12 21:45 ` Martin Steigerwald
2019-12-12 22:15 ` Jens Axboe
2019-12-12 22:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=63049728.ylUViGSH3C@merkaba \
--to=martin@lichtvoll.de \
--cc=axboe@kernel.dk \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.