From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-block <linux-block@vger.kernel.org>,
Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@fb.com>,
Dave Chinner <david@fromorbit.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED
Date: Wed, 11 Dec 2019 19:03:40 -0700 [thread overview]
Message-ID: <00a5c8b7-215a-7615-156d-d8f3dbb1cd3a@kernel.dk> (raw)
In-Reply-To: <e7fc6b37-8106-4fe2-479c-05c3f2b1c1f1@kernel.dk>
On 12/11/19 6:09 PM, Jens Axboe wrote:
> On 12/11/19 4:41 PM, Jens Axboe wrote:
>> On 12/11/19 1:18 PM, Linus Torvalds wrote:
>>> On Wed, Dec 11, 2019 at 12:08 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> $ cat /proc/meminfo | grep -i active
>>>> Active: 134136 kB
>>>> Inactive: 28683916 kB
>>>> Active(anon): 97064 kB
>>>> Inactive(anon): 4 kB
>>>> Active(file): 37072 kB
>>>> Inactive(file): 28683912 kB
>>>
>>> Yeah, that should not put pressure on some swap activity. We have 28
>>> GB of basically free inactive file data, and the VM is doing something
>>> very very bad if it then doesn't just quickly free it with no real
>>> drama.
>>>
>>> In fact, I don't think it should even trigger kswapd at all, it should
>>> all be direct reclaim. Of course, some of the mm people hate that with
>>> a passion, but this does look like a prime example of why it should
>>> just be done.
>>
>> For giggles, I ran just a single thread on the file set. We're only
>> doing about 100K IOPS at that point, yet when the page cache fills,
>> kswapd still eats 10% cpu. That seems like a lot for something that
>> slow.
>
> Warning, the below is from the really crazy department...
>
> Anyway, I took a closer look at the profiles for the uncached case.
> We're spending a lot of time doing memsets (this is the xa_node init,
> from the radix tree constructor), and call_rcu for the node free later
> on. All wasted time, and something that meant we weren't as close to the
> performance of O_DIRECT as we could be.
>
> So Chris and I started talking about this, and pondered "what would
> happen if we simply bypassed the page cache completely?". Case in point,
> see below incremental patch. We still do the page cache lookup, and use
> that page to copy from if it's there. If the page isn't there, allocate
> one and do IO to it, but DON'T add it to the page cache. With that,
> we're almost at O_DIRECT levels of performance for the 4k read case,
> without 1-2%. I think 512b would look awesome, but we're reading full
> pages, so that won't really help us much. Compared to the previous
> uncached method, this is 30% faster on this device. That's substantial.
>
> Obviously this has issues with truncate that would need to be resolved,
> and it's definitely dirtier. But the performance is very enticing...
Tested and cleaned a bit, and added truncate protection through
inode_dio_begin()/inode_dio_end().
https://git.kernel.dk/cgit/linux-block/commit/?h=buffered-uncached&id=6dac80bc340dabdcbfb4230b9331e52510acca87
This is much faster than the previous page cache dance, and I _think_
we're ok as long as we block truncate and hole punching.
Comments?
--
Jens Axboe
next prev parent reply other threads:[~2019-12-12 2:03 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-11 15:29 [PATCHSET v3 0/5] Support for RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-11 15:29 ` [PATCH 2/5] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-11 15:29 ` [PATCH 3/5] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 4/5] iomap: pass in the write_begin/write_end flags to iomap_actor Jens Axboe
2019-12-11 17:19 ` Linus Torvalds
2019-12-11 15:29 ` [PATCH 5/5] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-11 17:19 ` Matthew Wilcox
2019-12-11 18:05 ` Jens Axboe
2019-12-12 22:34 ` Dave Chinner
2019-12-13 0:54 ` Jens Axboe
2019-12-13 0:57 ` Jens Axboe
2019-12-16 4:17 ` Dave Chinner
2019-12-17 14:31 ` Jens Axboe
2019-12-18 0:49 ` Dave Chinner
2019-12-18 1:01 ` Jens Axboe
2019-12-11 17:37 ` [PATCHSET v3 0/5] Support for RWF_UNCACHED Linus Torvalds
2019-12-11 17:56 ` Jens Axboe
2019-12-11 19:14 ` Linus Torvalds
2019-12-11 19:34 ` Jens Axboe
2019-12-11 20:03 ` Linus Torvalds
2019-12-11 20:08 ` Jens Axboe
2019-12-11 20:18 ` Linus Torvalds
2019-12-11 21:04 ` Johannes Weiner
2019-12-12 1:30 ` Jens Axboe
2019-12-11 23:41 ` Jens Axboe
2019-12-12 1:08 ` Linus Torvalds
2019-12-12 1:11 ` Jens Axboe
2019-12-12 1:22 ` Linus Torvalds
2019-12-12 1:29 ` Jens Axboe
2019-12-12 1:41 ` Linus Torvalds
2019-12-12 1:56 ` Matthew Wilcox
2019-12-12 2:47 ` Linus Torvalds
2019-12-12 17:52 ` Matthew Wilcox
2019-12-12 18:29 ` Linus Torvalds
2019-12-12 20:05 ` Matthew Wilcox
2019-12-12 1:41 ` Jens Axboe
2019-12-12 1:49 ` Linus Torvalds
2019-12-12 1:09 ` Jens Axboe
2019-12-12 2:03 ` Jens Axboe [this message]
2019-12-12 2:10 ` Jens Axboe
2019-12-12 2:21 ` Matthew Wilcox
2019-12-12 2:38 ` Jens Axboe
2019-12-12 22:18 ` Dave Chinner
2019-12-13 1:32 ` Chris Mason
2020-01-07 17:42 ` Christoph Hellwig
2020-01-08 14:09 ` Chris Mason
2020-02-01 10:33 ` Andres Freund
2019-12-11 20:43 ` Matthew Wilcox
2019-12-11 20:04 ` Jens Axboe
2019-12-12 10:44 ` Martin Steigerwald
2019-12-12 15:16 ` Jens Axboe
2019-12-12 21:45 ` Martin Steigerwald
2019-12-12 22:15 ` Jens Axboe
2019-12-12 22:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00a5c8b7-215a-7615-156d-d8f3dbb1cd3a@kernel.dk \
--to=axboe@kernel.dk \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).