linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zlatko Calusic <zcalusic@bitsync.net>
To: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH RFC] mm: lru milestones, timestamps and ages
Date: Sat, 04 May 2013 15:32:58 +0200	[thread overview]
Message-ID: <51850E0A.5010803@bitsync.net> (raw)
In-Reply-To: <5184F6C9.4060506@openvz.org>

On 04.05.2013 13:53, Konstantin Khlebnikov wrote:
> Zlatko Calusic wrote:
>> On 30.04.2013 13:02, Konstantin Khlebnikov wrote:
>>> This patch adds engine for estimating rotation time for pages in lru
>>> lists.
>>>
>>> This adds bunch of 'milestones' into each struct lruvec and inserts
>>> them into
>>> lru lists periodically. Milestone flows in lru together with pages
>>> and brings
>>> timestamp to the end of lru. Because milestones are embedded into
>>> lruvec they
>>> can be easily distinguished from pages by comparing pointers.
>>> Only few functions should care about that.
>>>
>>> This machinery provides discrete-time estimation for age of pages
>>> from the end
>>> of each lru and average age of each kind of evictable lrus in each zone.
>>
>> Great stuff!
>
> Thanks!
>
>>
>> Believe it or not, I had an idea of writing something similar to this,
>> but of course having an idea and actually implementing it are two very
>> different things. Thank you for your work!
>>
>> I will use this to prove (or not) that file pages in the normal zone
>> on a 4GB RAM machine are reused waaaay too soon. Actually, I already
>> have the patch applied and running on the desktop, but it should be
>> much more useful on server workloads. Desktops have erratic load and
>> can go for a long time with very little I/O activity. But, here are
>> the current numbers anyway:
>>
>> Node 0, zone DMA32
>> pages free 5371
>> nr_inactive_anon 4257
>> nr_active_anon 139719
>> nr_inactive_file 617537
>> nr_active_file 51671
>> inactive_ratio: 5
>> avg_age_inactive_anon: 2514752
>> avg_age_active_anon: 2514752
>> avg_age_inactive_file: 876416
>> avg_age_active_file: 2514752
>> Node 0, zone Normal
>> pages free 424
>> nr_inactive_anon 253
>> nr_active_anon 54480
>> nr_inactive_file 63274
>> nr_active_file 44116
>> inactive_ratio: 1
>> avg_age_inactive_anon: 2531712
>> avg_age_active_anon: 2531712
>> avg_age_inactive_file: 901120
>> avg_age_active_file: 2531712
>>
>>> In our kernel we use similar engine as source of statistics for
>>> scheduler in
>>> memory reclaimer. This is O(1) scheduler which shifts vmscan
>>> priorities for lru
>>> vectors depending on their sizes, limits and ages. It tries to
>>> balance memory
>>> pressure among containers. I'll try to rework it for the mainline
>>> kernel soon.
>>>
>>> Seems like these ages also can be used for optimal memory pressure
>>> distribution
>>> between file and anon pages, and probably for balancing pressure
>>> among zones.
>>
>> This all sounds very promising. Especially because I currently observe
>> quite some imbalance among zones.
>
> As I see, most likely reason of such imbalances is 'break' condition
> inside of shrink_lruvec().
> So can try to disable it see what will happen.

Thanks for the hint. I will pay some more attention to this function 
next time I investigate code.

>
> But these numbers from your desktop actually doesn't proves this
> problem. Seems like difference
> between zones is within the precision of this method. I don't know how
> to describe this precisely.
> Probably irregularity between milestones also should be taken into the
> account to describe current
> situation and quality of measurement.
>

Ah, no, the numbers were more like a proof that your patch is running 
fine, nothing specific about them. I was just making a quick check that 
your patch is stable enough before I run it in production, and it seems 
it's working just fine.

In the next hour or so I will patch the kernel on the server where I 
intend to do much more analysis. I also prepared a set of graphs based 
on the numbers your code provides. Based on the preliminary tests, I 
believe that I'll be interested only in the aging of the inactive file 
lists. What I'm after is the bug explained here 
http://marc.info/?l=linux-mm&m=136571221426984 and if I'm right, your 
patch will help to better reveal extreme disbalance observed between 
dma32 and normal zone file LRU aging. But only on a 4GB nodes. I haven't 
seen anything similar on a 8GB nodes, where dma32 and normal zones are 
approximately the same sizes.
-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-05-04 13:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-30 11:02 [PATCH RFC] mm: lru milestones, timestamps and ages Konstantin Khlebnikov
2013-05-03 14:07 ` Zlatko Calusic
2013-05-04 11:53   ` Konstantin Khlebnikov
2013-05-04 13:01     ` Konstantin Khlebnikov
2013-05-04 21:36       ` Zlatko Calusic
2013-05-06 19:08       ` Johannes Weiner
2013-05-04 13:32     ` Zlatko Calusic [this message]
2013-05-10 10:28 ` Mel Gorman
2013-05-10 14:12   ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51850E0A.5010803@bitsync.net \
    --to=zcalusic@bitsync.net \
    --cc=khlebnikov@openvz.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).