Re: mmap vs fs cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ric Mason <ric.masonn@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>, Howard Chu <hyc@symas.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: mmap vs fs cache
Date: Sat, 09 Mar 2013 10:34:32 +0800	[thread overview]
Message-ID: <513A9FB8.1050807@gmail.com> (raw)
In-Reply-To: <20130308020854.GC23767@cmpxchg.org>

Hi Johannes,
On 03/08/2013 10:08 AM, Johannes Weiner wrote:
> On Thu, Mar 07, 2013 at 04:43:12PM +0100, Jan Kara wrote:
>>    Added mm list to CC.
>>
>> On Tue 05-03-13 09:57:34, Howard Chu wrote:
>>> I'm testing our memory-mapped database code on a small VM. The
>>> machine has 32GB of RAM and the size of the DB on disk is ~44GB. The
>>> database library mmaps the entire file as a single region and starts
>>> accessing it as a tree of B+trees. Running on an Ubuntu 3.5.0-23
>>> kernel, XFS on a local disk.
>>>
>>> If I start running read-only queries against the DB with a freshly
>>> started server, I see that my process (OpenLDAP slapd) quickly grows
>>> to an RSS of about 16GB in tandem with the FS cache. (I.e., "top"
>>> shows 16GB cached, and slapd is 16GB.)
>>> If I confine my queries to the first 20% of the data then it all
>>> fits in RAM and queries are nice and fast.
>>>
>>> if I extend the query range to cover more of the data, approaching
>>> the size of physical RAM, I see something strange - the FS cache
>>> keeps growing, but the slapd process size grows at a slower rate.
>>> This is rather puzzling to me since the only thing triggering reads
>>> is accesses through the mmap region. Eventually the FS cache grows
>>> to basically all of the 32GB of RAM (+/- some text/data space...)
>>> but the slapd process only reaches 25GB, at which point it actually
>>> starts to shrink - apparently the FS cache is now stealing pages
>>> from it. I find that a bit puzzling; if the pages are present in
>>> memory, and the only reason they were paged in was to satisfy an
>>> mmap reference, why aren't they simply assigned to the slapd
>>> process?
>>>
>>> The current behavior gets even more aggravating: I can run a test
>>> that spans exactly 30GB of the data. One would expect that the slapd
>>> process should simply grow to 30GB in size, and then remain static
>>> for the remainder of the test. Instead, the server grows to 25GB,
>>> the FS cache grows to 32GB, and starts stealing pages from the
>>> server, shrinking it back down to 19GB or so.
>>>
>>> If I do an "echo 1 > /proc/sys/vm/drop_caches" at the onset of this
>>> condition, the FS cache shrinks back to 25GB, matching the slapd
>>> process size.
>>> This then frees up enough RAM for slapd to grow further. If I don't
>>> do this, the test is constantly paging in data from disk. Even so,
>>> the FS cache continues to grow faster than the slapd process size,
>>> so the system may run out of free RAM again, and I have to drop
>>> caches multiple times before slapd finally grows to the full 30GB.
>>> Once it gets to that size the test runs entirely from RAM with zero
>>> I/Os, but it doesn't get there without a lot of babysitting.
>>>
>>> 2 questions:
>>>    why is there data in the FS cache that isn't owned by (the mmap
>>> of) the process that caused it to be paged in in the first place?
> The filesystem cache is shared among processes because the filesystem
> is also shared among processes.  If another task were to access the
> same file, we still should only have one copy of that data in memory.
>
> It sounds to me like slapd is itself caching all the data it reads.
> If that is true, shouldn't it really be using direct IO to prevent
> this double buffering of filesystem data in memory?

When use direct IO is better? When use page cache is better?

>
>>>    is there a tunable knob to discourage the page cache from stealing
>>> from the process?
> Try reducing /proc/sys/vm/swappiness, which ranges from 0-100 and
> defaults to 60.

Why redunce? IIUC, swappiness is used to determine how aggressive 
reclaim anonymous pages, if the value is high more anonymous pages will 
be reclaimed.

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2013-03-09  2:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5136320E.8030109@symas.com>
2013-03-07 15:43 ` mmap vs fs cache Jan Kara
2013-03-08  2:08   ` Johannes Weiner
2013-03-08  7:46     ` Howard Chu
2013-03-08  8:42       ` Kirill A. Shutemov
2013-03-08  9:40         ` Howard Chu
2013-03-08 14:47           ` Chris Friesen
2013-03-08 15:00             ` Howard Chu
2013-03-08 15:25               ` Chris Friesen
2013-03-08 16:16               ` Johannes Weiner
2013-03-08 20:04                 ` Howard Chu
2013-03-11 12:04                   ` Jan Kara
2013-03-11 12:40                     ` Howard Chu
2013-03-09  3:28                 ` Ric Mason
2013-03-09  1:22               ` Phillip Susi
2013-03-11 11:52                 ` Jan Kara
2013-03-11 15:03                   ` Phillip Susi
2013-03-09  2:34     ` Ric Mason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513A9FB8.1050807@gmail.com \
    --to=ric.masonn@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hyc@symas.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).