linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zlatko Calusic <zcalusic@bitsync.net>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-mm <linux-mm@kvack.org>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>
Subject: Re: The pagecache unloved in zone NORMAL?
Date: Sun, 05 May 2013 23:50:43 +0200	[thread overview]
Message-ID: <5186D433.3050301@bitsync.net> (raw)
In-Reply-To: <51671D4D.9080003@bitsync.net>

[-- Attachment #1: Type: text/plain, Size: 3911 bytes --]

On 11.04.2013 22:30, Zlatko Calusic wrote:
> This is something that I've been chasing for months, and I'm getting
> tired of it. :(
>
> The issue has been observed on 4GB RAM x86_64 machines (one server, one
> desktop) without swap subsystem (not even compiled in). The important
> thing to remember about a 4GB x86_64 machine is that the NORMAL zone is
> about 6 times smaller than the DMA32 zone.
>
> As picture is 10000 words, I've attached two graphs that nicely show
> what I've observed. As memory usage slowly rises, the MM subsystem
> gradually evicts pagecache pages from the NORMAL zone, trying to
> eventually get rid of all of them! This process takes days, typically
> more than 5 on this particular server. Of course, this means that
> eventually the zone will be choke full of anon pages, and without swap,
> the kernel can't do much about it. But as it tries to balance the zone,
> various bad things will happen. On the server I've seen sudden freeing
> of hundreds of MB of pagecache, on the desktop there's a general
> slowdown, sound dropouts (HTTP streaming) and so...

An excellent Konstantin's patch better described here 
http://marc.info/?l=linux-mm&m=136731974301311 is already giving some 
useful additional insight into this problem, just as I expected. Here's 
the data after 31h of server uptime (also see the attached graph):

Node 0, zone    DMA32
     nr_inactive_file 443705
   avg_age_inactive_file: 362800
Node 0, zone   Normal
     nr_inactive_file 32832
   avg_age_inactive_file: 38760

I reckon that only aging of the inactive LRU lists is of the interest at 
the moment, because there's currently a streaming I/O of about 8MB/s 
that can be seen on the graphs. Here's how I decipher the numbers:

DMA32: 443705 pages * 4k ~ 1733MB, 362800 ms = 362.8 seconds to go 
through the LRU and replace each page in it, which finally gives: 
1733/362.8 ~ 4.78 MB/s (approx speed at which the reclaim is goin' on)

Normal zone: 32832*4/1024/38.76 ~ 3.31 MB/s

Check: 4.78 + 3.31 ~ 8 MB/s (just about the rate of the read I/O from 
the disk)

So, if my calculations are right and my model makes sense (Konstantin, 
chime in if I got something wrong!), the reclaim is going through the 
pages in those 2 zones at a very similar speed, although there's already 
13 times less pages in the Normal zone available for streaming I/O 
caching. If this behavior continues when Normal zone get practically 
washed out of file pages (guaranteed in a few days), then we will 
measure TTL of pages in the Normal zone by milliseconds. Not a very 
useful cache, you'll agree. Of course, it's not a problem for streaming 
reads, but dirty pages that end there will be written out practically 
synchronously, and then it's no wonder that the desktop at those moments 
starts behaving worse than a trusty old 486DX2 with 16MB of RAM once was. :(

The only question I have is, is this a design mistake, or a plain bug?

I strongly believe that pages should be reclaimed at speed appropriate 
to the LRU size. After all, all those pages are the same as far as I/O 
is concerned, so there's no reason to throw out some pages after only 38 
seconds, while others are privileged to spend 6 minutes in the memory? 
Those are the numbers from the data above, and we'll see by the end of 
the following week how bad it can really get.

This imbalance is possibly the main reasons why file pages are pushed 
out from the Normal zone too aggresively in the first place. Probably, 
if we can balance the reclaim speed, the whole problem would disappear. 
It looks like faster reclaim in the smaller zone manages to throw out 
more file pages from it (anon pages replace them easier), which in turn 
makes the file LRU's even smaller, which produces even faster reclaim, 
which... you get the idea, kind of a positive feedback loop that feeds 
on itself. The kind that always ends up with a bang. ;)

-- 
Zlatko

[-- Attachment #2: screenshot12.png --]
[-- Type: image/png, Size: 51090 bytes --]

  reply	other threads:[~2013-05-05 21:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-11 20:30 The pagecache unloved in zone NORMAL? Zlatko Calusic
2013-05-05 21:50 ` Zlatko Calusic [this message]
2013-05-09 20:24   ` Zlatko Calusic
2013-05-12 17:53   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5186D433.3050301@bitsync.net \
    --to=zcalusic@bitsync.net \
    --cc=khlebnikov@openvz.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).