linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@parallels.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm v9 0/8] idle memory tracking
Date: Wed, 29 Jul 2015 19:29:08 +0300	[thread overview]
Message-ID: <20150729162908.GY8100@esperanza> (raw)
In-Reply-To: <20150729154718.GN15801@dhcp22.suse.cz>

On Wed, Jul 29, 2015 at 05:47:18PM +0200, Michal Hocko wrote:
> On Wed 29-07-15 18:28:17, Vladimir Davydov wrote:
> > On Wed, Jul 29, 2015 at 04:26:19PM +0200, Michal Hocko wrote:
> > > On Wed 29-07-15 16:59:07, Vladimir Davydov wrote:
> > > > On Wed, Jul 29, 2015 at 02:36:30PM +0200, Michal Hocko wrote:
> > > > > On Sun 19-07-15 15:31:09, Vladimir Davydov wrote:
> > > > > [...]
> > > > > > ---- USER API ----
> > > > > > 
> > > > > > The user API consists of two new proc files:
> > > > > 
> > > > > I was thinking about this for a while. I dislike the interface.  It is
> > > > > quite awkward to use - e.g. you have to read the full memory to check a
> > > > > single memcg idleness. This might turn out being a problem especially on
> > > > > large machines.
> > > > 
> > > > Yes, with this API estimating the wss of a single memory cgroup will
> > > > cost almost as much as doing this for the whole system.
> > > > 
> > > > Come to think of it, does anyone really need to estimate idleness of one
> > > > particular cgroup?
> > > 
> > > It is certainly interesting for setting the low limit.
> > 
> > Yes, but IMO there is no point in setting the low limit for one
> > particular cgroup w/o considering what's going on with the rest of the
> > system.
> 
> If you use the low limit for isolating an important load then you do not
> have to care about the others that much. All you care about is to set
> the reasonable protection level and let others to compete for the rest.

That's a use case, you're right. Well, it's a natural limitation of this
API - you just have to perform a full PFN scan then. You can avoid
costly rmap walks for the cgroups you are not interested in by filtering
them out using /proc/kpagecgroup though.

> 
> [...]
> > > > > I would assume that most users are interested only in a single number
> > > > > which tells the idleness of the system/memcg.
> > > > 
> > > > Yes, that's what I need it for - estimating containers' wss for setting
> > > > their limits accordingly.
> > > 
> > > So why don't we export the single per memcg and global knobs then?
> > > This would have few advantages. First of all it would be much easier to
> > > use, you wouldn't have to export memcg ids and finally the implementation
> > > could be changed without any user visible changes (e.g. lru vs. pfn walks),
> > > potential caching and who knows what. In other words. Michel had a
> > > single number interface AFAIR, what was the primary reason to move away
> > > from that API?
> > 
> > Because there is too much to be taken care of in the kernel with such an
> > approach and chances are high that it won't satisfy everyone. What
> > should the scan period be equal too?
> 
> No, just gather the data on the read request and let the userspace
> to decide when/how often etc. If we are clever enough we can cache
> the numbers and prevent from the walk. Write to the file and do the
> mark_idle stuff.

Still, scan rate limiting would be an issue IMO.

> 
> > Knob. How many kthreads do we want?
> > Knob. I want to keep history for last N intervals (this was a part of
> > Michel's implementation), what should N be equal to? Knob.
> 
> This all relates to the kernel thread implementation which I wasn't
> suggesting. I was referring to Michel's work which might induce that.
> I was merely referring to a single number output. Sorry about the
> confusion.

Still, what about idle stats history? I mean having info about how many
pages were idle for N scans. It might be useful for more robust/accurate
wss estimation.

> 
> > I want to be
> > able to choose between an instant scan and a scan distributed in time.
> > Knob. I want to see stats for anon/locked/file/dirty memory separately,
> 
> Why is this useful for the memcg limits setting or the wss estimation? I
> can imagine that a further drop down numbers might be interesting
> from the debugging POV but I fail to see what kind of decisions from
> userspace you would do based on them.

A couple examples that pop up in my mind:

It's difficult to make wss estimation perfect. By mlocking pages, a
workload might give a hint to the system that it will be really unhappy
if they are evicted.

One might want to consider anon pages and/or dirty pages as not idle in
order to protect them and hence avoid expensive pageout/swapout.

> 
> [...]
> > > Yes this is really tricky with the current LRU implementation. I
> > > was playing with some ideas (do some checkpoints on the way) but
> > > none of them was really working out on a busy systems. But the LRU
> > > implementation might change in the future.
> > 
> > It might. Then we could come up with a new /proc or /sys file which
> > would do the same as /proc/kpageidle, but on per LRU^w whatever-it-is
> > basis, and give people a choice which one to use.
> 
> This just leads to proc files count explosion we are seeing
> already... Proc ended up in dump ground for different things which
> didn't fit elsewhere and I am not very much happy about it to be honest.

Moving the API to memcg is not a good idea either IMO, because the
feature can actually be useful with memcg disabled, e.g. it might help
estimate if the system is over- or underloaded.

/proc/kpageidle should probably live somewhere in /sys/kernel/mm, but I
added it where similar files are located (kpagecount, kpageflags) to
keep things consistent.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-07-29 16:29 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-19 12:31 [PATCH -mm v9 0/8] idle memory tracking Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 1/8] memcg: add page_cgroup_ino helper Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-22  9:21     ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 2/8] hwpoison: use page_cgroup_ino for filtering by memcg Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-22  9:45     ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 3/8] memcg: zap try_get_mem_cgroup_from_page Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 4/8] proc: add kpagecgroup file Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-22 10:33     ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 5/8] mmu-notifier: add clear_young callback Vladimir Davydov
2015-07-20 18:34   ` Andres Lagar-Cavilla
2015-07-21  8:51     ` Vladimir Davydov
2015-07-22 16:33       ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 6/8] proc: add kpageidle file Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-22 15:20     ` Vladimir Davydov
2015-07-24 14:08   ` Paul Gortmaker
2015-07-24 14:17     ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 7/8] proc: export idle flag via kpageflags Vladimir Davydov
2015-07-21 23:35   ` Andrew Morton
2015-07-22 16:25     ` Vladimir Davydov
2015-07-22 19:44       ` Andrew Morton
2015-07-22 20:46         ` Andres Lagar-Cavilla
2015-07-23  7:57           ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 8/8] proc: add cond_resched to /proc/kpage* read/write loop Vladimir Davydov
2015-07-19 12:37 ` [PATCH -mm v9 0/8] idle memory tracking Vladimir Davydov
2015-07-21 21:39 ` Andres Lagar-Cavilla
2015-07-21 23:34 ` Andrew Morton
2015-07-22 16:23   ` Vladimir Davydov
2015-07-25 16:24     ` Vladimir Davydov
2015-07-27 19:18   ` Kees Cook
2015-07-27 19:25     ` Andrew Morton
2015-07-29 12:36 ` Michal Hocko
2015-07-29 13:59   ` Vladimir Davydov
2015-07-29 14:12     ` Michel Lespinasse
2015-07-29 14:13       ` Michel Lespinasse
2015-07-29 14:45       ` Vladimir Davydov
2015-07-29 15:08         ` Michel Lespinasse
2015-07-29 15:31           ` Vladimir Davydov
2015-07-29 15:34             ` Michel Lespinasse
2015-07-29 15:08         ` Michal Hocko
2015-07-29 15:36           ` Vladimir Davydov
2015-07-29 15:58             ` Michal Hocko
2015-07-29 14:26     ` Michal Hocko
2015-07-29 15:28       ` Vladimir Davydov
2015-07-29 15:47         ` Michal Hocko
2015-07-29 16:29           ` Vladimir Davydov [this message]
2015-07-29 21:30             ` Andrew Morton
2015-07-30  9:12               ` Vladimir Davydov
2015-07-30 13:01                 ` Vladimir Davydov
2015-07-31  9:34                   ` Vladimir Davydov
2015-07-30  9:07             ` Michal Hocko
2015-07-30  9:31               ` Vladimir Davydov
2015-07-29 15:55         ` Andres Lagar-Cavilla
2015-07-29 16:37           ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150729162908.GY8100@esperanza \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=gorcunov@openvz.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=rientjes@google.com \
    --cc=walken@google.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).