From: Jens Axboe <axboe@kernel.dk>
To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org
Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org,
david@fromorbit.com
Subject: Re: [PATCHSET v4 0/5] Support for RWF_UNCACHED
Date: Thu, 12 Dec 2019 17:53:27 -0700 [thread overview]
Message-ID: <1724f1c7-d404-9ce7-48ab-0d5f6f5dece5@kernel.dk> (raw)
In-Reply-To: <20191212190133.18473-1-axboe@kernel.dk>
On 12/12/19 12:01 PM, Jens Axboe wrote:
> Recently someone asked me how io_uring buffered IO compares to mmaped
> IO in terms of performance. So I ran some tests with buffered IO, and
> found the experience to be somewhat painful. The test case is pretty
> basic, random reads over a dataset that's 10x the size of RAM.
> Performance starts out fine, and then the page cache fills up and we
> hit a throughput cliff. CPU usage of the IO threads go up, and we have
> kswapd spending 100% of a core trying to keep up. Seeing that, I was
> reminded of the many complaints I here about buffered IO, and the fact
> that most of the folks complaining will ultimately bite the bullet and
> move to O_DIRECT to just get the kernel out of the way.
>
> But I don't think it needs to be like that. Switching to O_DIRECT isn't
> always easily doable. The buffers have different life times, size and
> alignment constraints, etc. On top of that, mixing buffered and O_DIRECT
> can be painful.
>
> Seems to me that we have an opportunity to provide something that sits
> somewhere in between buffered and O_DIRECT, and this is where
> RWF_UNCACHED enters the picture. If this flag is set on IO, we get the
> following behavior:
>
> - If the data is in cache, it remains in cache and the copy (in or out)
> is served to/from that. This is true for both reads and writes.
>
> - For writes, if the data is NOT in cache, we add it while performing the
> IO. When the IO is done, we remove it again.
>
> - For reads, if the data is NOT in the cache, we allocate a private page
> and use that for IO. When the IO is done, we free this page. The page
> never sees the page cache.
>
> With this, I can do 100% smooth buffered reads or writes without pushing
> the kernel to the state where kswapd is sweating bullets. In fact it
> doesn't even register.
>
> Comments appreciated! This should work on any standard file system,
> using either the generic helpers or iomap. I have tested ext4 and xfs
> for the right read/write behavior, but no further validation has been
> done yet. This version contains the bigger prep patch of switching
> iomap_apply() and actors to struct iomap_data, and I hope I didn't
> mess that up too badly. I'll try and exercise it all, I've done XFS
> mounts and reads+writes and it seems happy from that POV at least.
>
> The core of the changes are actually really small, the majority of
> the diff is just prep work to get there.
>
> Patches are against current git, and can also be found here:
>
> https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached
>
> fs/ceph/file.c | 2 +-
> fs/dax.c | 25 +++--
> fs/ext4/file.c | 2 +-
> fs/iomap/apply.c | 50 ++++++---
> fs/iomap/buffered-io.c | 225 +++++++++++++++++++++++++---------------
> fs/iomap/direct-io.c | 57 +++++-----
> fs/iomap/fiemap.c | 48 +++++----
> fs/iomap/seek.c | 64 +++++++-----
> fs/iomap/swapfile.c | 27 ++---
> fs/nfs/file.c | 2 +-
> include/linux/fs.h | 7 +-
> include/linux/iomap.h | 20 +++-
> include/uapi/linux/fs.h | 5 +-
> mm/filemap.c | 89 +++++++++++++---
> 14 files changed, 416 insertions(+), 207 deletions(-)
>
> Changes since v3:
> - Add iomap_actor_data to cut down on arguments
> - Fix bad flag drop in iomap_write_begin()
> - Remove unused IOMAP_WRITE_F_UNCACHED flag
> - Don't use the page cache at all for reads
Had the silly lru error in v4, and also an XFS flags error. I'm not
going to re-post already due to this, but please use:
https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached
if you're going to test this. You can pull it at:
git://git.kernel.dk/linux-block buffered-uncached
Those are the only two changes since v4. I'll throw a v5 out there a bit
later.
--
Jens Axboe
next prev parent reply other threads:[~2019-12-13 0:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-12 19:01 [PATCHSET v4 0/5] Support for RWF_UNCACHED Jens Axboe
2019-12-12 19:01 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-12 21:21 ` Matthew Wilcox
2019-12-12 21:27 ` Jens Axboe
2019-12-12 19:01 ` [PATCH 2/5] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-12 19:01 ` [PATCH 3/5] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-12 19:01 ` [PATCH 4/5] iomap: add struct iomap_data Jens Axboe
2019-12-13 20:32 ` Darrick J. Wong
2019-12-13 20:47 ` Jens Axboe
2019-12-12 19:01 ` [PATCH 5/5] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-13 2:26 ` Darrick J. Wong
2019-12-13 2:38 ` Jens Axboe
2019-12-13 0:53 ` Jens Axboe [this message]
-- strict thread matches above, loose matches on Subject: below --
2019-12-12 16:41 [PATCHSET v4 0/5] Support for RWF_UNCACHED Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1724f1c7-d404-9ce7-48ab-0d5f6f5dece5@kernel.dk \
--to=axboe@kernel.dk \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).