From: Dave Hansen <dave@linux.vnet.ibm.com>
To: "Peter Schüller" <scode@spotify.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Mattias de Zalenski <zalenski@spotify.com>,
linux-mm@kvack.org
Subject: Re: Sudden and massive page cache eviction
Date: Tue, 23 Nov 2010 08:19:31 -0800 [thread overview]
Message-ID: <1290529171.2390.7994.camel@nimitz> (raw)
In-Reply-To: <AANLkTik2Fn-ynUap2fPcRxRdKA=5ZRYG0LJTmqf80y+q@mail.gmail.com>
On Tue, 2010-11-23 at 10:44 +0100, Peter Schuller wrote:
> > You don't have anybody messing with /proc/sys/vm/drop_caches, do you?
>
> Highly unlikely given that (1) evictions, while often very
> significant, are usually not *complete* (although the first graph
> example I provided had a more or less complete eviction) and (2) the
> evictions are not obviously periodic indicating some kind of cron job,
> and (3) we see the evictions happening across a wide variety of
> machines.
>
> So yes, I feel confident that we are not accidentally doing that.
Yeah, drop_caches doesn't seem very likely.
Your postgres data looks the cleanest and is probably the easiest to
analyze. Might as well start there:
http://files.spotify.com/memcut/postgresql_weekly.png
As you said, it might not be the same as the others, but it's a decent
place to start. If someone used drop_caches or if someone was randomly
truncating files, we'd expect to see the active/inactive lines both drop
by relatively equivalent amounts, and see them happen at _exactly_ the
same time as the cache eviction. The eviction about 1/3 of the way
through Wednesday in the above graph kinda looks this way, but it's the
exception.
Just eyeballing it, _most_ of the evictions seem to happen after some
movement in the active/inactive lists. We see an "inactive" uptick as
we start to launder pages, and the page activation doesn't keep up with
it. This is a _bit_ weird since we don't see any slab cache or other
users coming to fill the new space. Something _wanted_ the memory, so
why isn't it being used?
Do you have any large page (hugetlbfs) or other multi-order (> 1 page)
allocations happening in the kernel?
If you could start recording /proc/{vmstat,buddystat,meminfo,slabinfo},
it would be immensely useful. The munin graphs are really great, but
they don't have the detail which you can get from stuff like vmstat.
> Further, we have observed the kernel's unwillingness to retain data in
> page cache under interesting circumstances:
>
> (1) page cache eviction happens
> (2) we warm up our BDB files by cat:ing them (simple but effective)
> (3) within a matter of minutes, while there is still several GB of
> free (truly free, not page cached), these are evicted (as evidenced by
> re-cat:ing them a little while later)
>
> This latest observation we understand may be due to NUMA related
> allocation issues, and we should probably try to use numactl to ask
> for a more even allocation. We have not yet tried this. However, it is
> not clear how any issues having to do with that would cause sudden
> eviction of data already *in* the page cache (on whichever node)..
For a page-cache-heavy workload where you care a lot more about things
being _in_ cache rather than having good NUMA locality, you probably
want "zone_reclaim_mode" set to 0:
http://www.kernel.org/doc/Documentation/sysctl/vm.txt
That'll be a bit more comprehensive than messing with numactl. It
really is the best thing if you just don't care about NUMA latencies all
that much. What kind of hardware is this, btw?
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-23 16:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <AANLkTikg-sR97tkG=ST9kjZcHe6puYSvMGh-eA3cnH7X@mail.gmail.com>
2010-11-23 0:11 ` Sudden and massive page cache eviction Andrew Morton
2010-11-23 8:38 ` Dave Hansen
2010-11-23 9:44 ` Peter Schüller
2010-11-23 16:19 ` Dave Hansen [this message]
2010-11-24 14:02 ` Peter Schüller
2010-11-24 14:14 ` Peter Schüller
2010-11-24 14:20 ` Pekka Enberg
2010-11-24 15:32 ` Peter Schüller
2010-11-24 17:46 ` Pekka Enberg
2010-11-25 1:18 ` Simon Kirby
2010-11-25 15:59 ` Peter Schüller
2010-12-01 6:36 ` Simon Kirby
2010-11-24 17:32 ` Dave Hansen
2010-11-25 15:33 ` Peter Schüller
2010-12-01 9:15 ` Simon Kirby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1290529171.2390.7994.camel@nimitz \
--to=dave@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=scode@spotify.com \
--cc=zalenski@spotify.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).