From: Vladimir Davydov <vdavydov@parallels.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
Michel Lespinasse <walken@google.com>,
David Rientjes <rientjes@google.com>,
Pavel Emelyanov <xemul@parallels.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
Jonathan Corbet <corbet@lwn.net>,
linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/3] idle memory tracking
Date: Thu, 19 Mar 2015 11:08:18 +0300 [thread overview]
Message-ID: <20150319080818.GD29416@esperanza> (raw)
In-Reply-To: <20150319021337.GD9153@blaptop>
On Thu, Mar 19, 2015 at 11:13:37AM +0900, Minchan Kim wrote:
> On Wed, Mar 18, 2015 at 11:44:33PM +0300, Vladimir Davydov wrote:
> > 1. Write 1 to /proc/sys/vm/set_idle.
> >
> > This will set the IDLE flag for all user pages. The IDLE flag is cleared
> > when the page is read or the ACCESS/YOUNG bit is cleared in any PTE pointing
> > to the page. It is also cleared when the page is freed.
>
> We should scan all of pages periodically? I understand why you did but
> someone might not take care of unmapped pages so I hope it should be optional.
> if someone just want to catch mapped file+anon pages, he can do it
> by scanning of address space of the process he selects.
> Even, someone might want to scan just part of address space rather than
> all address space of the process. Acutally, I have such scenario.
You still can estimate the working set size of a particular process, or
even by a part of its address space, by setting the IDLE bit for all
user pages, but clearing refs for and analyzing only those pages you are
interested in. You can filter them by scanning /proc/PID/pagemap.
If you are concerned about performance, I don't think it would be an
issue: on my test machine setting the IDLE bit for 20 GB of user pages
takes about 150 ms. Provided that this kind of work is supposed to be
done relatively rarely (every several minutes or so), the overhead looks
negligible to me. Anyway, we can introduce /proc/PID/set_mem_idle for
setting the IDLE bit only on pages of a particular address space.
>
> >
> > 2. Wait some time.
> >
> > 3. Write 6 to /proc/PID/clear_refs for each PID of interest.
> >
> > This will clear the IDLE flag for recently accessed pages.
> >
> > 4. Count the number of idle pages as reported by /proc/kpageflags. One may use
> > /proc/PID/pagemap and/or /proc/kpagecgroup to filter pages that belong to a
> > certain application/container.
> >
>
> Adding two new page flags? I don't know it's okay for 64bit but there is no
> room for 32bit. Please take care of 32 bit. It would be good feature for
> embedded. How about using page_ext if you couldn't make room for page->flags
> for 32bit? You would add per-page meta data in there.
For the time being, I made it dependant on 64BIT explicitly, because I
am only interested in analyzing working set size of containers running
on big machines, but I admit one could use page_ext for storing the
additional flags if compiled for 32 bit.
>
> Your suggestion is generic so my concern is overhead. On every iteration,
> we should set/clear/investigate page flags. I don't know how much overhead
> is in there but it surely could be big if memory is big.
> Couldn't we do that at one go? Maybe, like mincore
>
> int idlecore(pid_t pid, void *addr, size_t length, unsigned char *vec)
>
> So, we could know what pages of the process[pid] were idle by vec in
> [addr, lentgh] and reset idle of the pages for the process
> in the system call at one go.
I don't think adding yet another syscall for such a specialized feature
is a good idea. Besides, I want to keep the interface consistent with
/proc/PID/clear_refs, which IMO suits perfectly well for clearing the
IDLE flag on referenced pages. As I mentioned above, to reduce the
overhead in case the user is not interested in unmapped file pages, we
could introduce /proc/PID/set_mem_idle, though I think this only should
be done if there are complains about /proc/sys/vm/set_idle performance.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-19 8:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-18 20:44 [PATCH 0/3] idle memory tracking Vladimir Davydov
2015-03-18 20:44 ` [PATCH 1/3] memcg: add page_cgroup_ino helper Vladimir Davydov
2015-03-18 20:44 ` [PATCH 2/3] proc: add kpagecgroup file Vladimir Davydov
2015-03-18 20:44 ` [PATCH 3/3] mm: idle memory tracking Vladimir Davydov
2015-03-19 10:12 ` Cyrill Gorcunov
2015-03-19 10:41 ` Vladimir Davydov
2015-03-19 10:45 ` Cyrill Gorcunov
2015-03-19 2:13 ` [PATCH 0/3] " Minchan Kim
2015-03-19 8:08 ` Vladimir Davydov [this message]
2015-03-24 7:45 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150319080818.GD29416@esperanza \
--to=vdavydov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=gorcunov@openvz.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=minchan@kernel.org \
--cc=rientjes@google.com \
--cc=walken@google.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).