From: Ying Han <yinghan@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
Balbir Singh <bsingharora@gmail.com>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mel@csn.ul.ie>,
Pavel Emelyanov <xemul@openvz.org>,
linux-mm@kvack.org
Subject: Re: memcg: add mlock statistic in memory.stat
Date: Wed, 11 Jan 2012 16:50:09 -0800 [thread overview]
Message-ID: <CALWz4iyuT48FWuw52bcu3B9GvHbz3c3ODcsgPzOP80UOP1Q-bQ@mail.gmail.com> (raw)
In-Reply-To: <20120112085937.ae601869.kamezawa.hiroyu@jp.fujitsu.com>
On Wed, Jan 11, 2012 at 3:59 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 11 Jan 2012 15:17:42 -0800 (PST)
> Hugh Dickins <hughd@google.com> wrote:
>
>> On Wed, 11 Jan 2012, Ying Han wrote:
>>
>> > We have the nr_mlock stat both in meminfo as well as vmstat system wide, this
>> > patch adds the mlock field into per-memcg memory stat. The stat itself enhances
>> > the metrics exported by memcg, especially is used together with "uneivctable"
>> > lru stat.
>> >
>> > --- a/include/linux/page_cgroup.h
>> > +++ b/include/linux/page_cgroup.h
>> > @@ -10,6 +10,7 @@ enum {
>> > /* flags for mem_cgroup and file and I/O status */
>> > PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
>> > PCG_FILE_MAPPED, /* page is accounted as "mapped" */
>> > + PCG_MLOCK, /* page is accounted as "mlock" */
>> > /* No lock in page_cgroup */
>> > PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */
>> > __NR_PCG_FLAGS,
>>
>> Is this really necessary? KAMEZAWA-san is engaged in trying to reduce
>> the number of PageCgroup flags, and I expect that in due course we shall
>> want to merge them in with Page flags, so adding more is unwelcome.
>> I'd have thought that with memcg_ hooks in the right places,
>> a separate flag would not be necessary?
>>
>
> Please don't ;)
>
> NR_UNEIVCTABLE_LRU is not enough ?
Seems not.
The unevictable lru includes more than mlock()'d pages ( SHM_LOCK'd
etc). There are use cases where we like to know the mlock-ed size
per-cgroup. We used to archived that in fake-numa based container by
reading the value from per-node meminfo, however we miss that
information in memcg. What do you think?
Thank you Hugh and Kame for the reference. Apparently I missed that
patch and I will take a look at it. (still catching up emails after
vacation).
--Ying
>
> Following is the patch I posted before to remove PCG_FILE_MAPPED.
> Then, I think you can use similar logic and make use of UNEVICTABLE flags.
>
> ==
> better (lockless) idea is welcomed.
>
> From fd2b5822838eebbacc41f343f9eb8c6f0ad8e1cc Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 15 Dec 2011 11:42:49 +0900
> Subject: [PATCH 2/5] memcg: safer page stat updating
>
> Now, page stat accounting is done like this.
>
> if (....set flag or some)
> update vmstat
> update memcg'stat
>
> Unlike vmstat, memcg must take care of changes in pc->mem_cgroup.
> This is done by page_cgroup_move_lock and other flags per stats.
>
> I think FileMapped works well. But, considering update of other
> statistics, current logic doesn't works well. Assume following case,
>
> set flag
> ..(delay by some preemption)..
> clear flag
> pc's flag is unset and
> don't update anything.
> memcg = pc->mem_cgroup
> set flag to pc->mem_cgroup
> update memcg stat
>
> In this case, the stat will be leaked out. I think memcg's account
> routine should see no flags. To avoid using memcg's original flags,
> we need to prevent overwriting pc->mem_cgroup while we updating
> the memcg.
>
> This patch adds
> - mem_cgroup_begin_update_page_stats(),
> - mem_cgroup_end_update_page_stats()
>
> And guarantees pc->mem_cgroup is not overwritten while updating.
> The caller should do
>
> mem_cgroup_begin_update_page_stats()
> if (.... set flag or some)
> update vmstat
> update memcg's stat
> mem_cgroup_end_update_page_stats().
>
> This beign...end will check a counter (which is 0 in most case) under
> rcu_read_lock/rcu_read_unlock. And take a spinlock if required.
>
> Following patch in this series will remove PCG_FILE_MAPPED flag.
> ---
> include/linux/memcontrol.h | 49 +++++++++++++++++++++++++++++++++++++++++-
> mm/memcontrol.c | 50 +++++++++++++++++++++++++++++--------------
> mm/rmap.c | 5 ++++
> 3 files changed, 86 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 598b3c9..4a61c4b 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -141,9 +141,52 @@ static inline bool mem_cgroup_disabled(void)
> return false;
> }
>
> -void mem_cgroup_update_page_stat(struct page *page,
> +/*
> + * Unlike vmstat, page's mem_cgroup can be overwritten and for which memcg
> + * the page stats should be accounted to is determined dynamically.
> + * Unfortunately, there are many races. To avoid races, the caller should do
> + *
> + * locked = mem_cgroup_begin_update_page_stat(page)
> + * if (set page flags etc)
> + * mem_cgroup_update_page_stat(page);
> + * mem_cgroup_end_update_page_stat(page, locked);
> + *
> + * Between [begin, end) calls, page's mem_cgroup will never be changed.
> + */
> +void __mem_cgroup_update_page_stat(struct page *page,
> + enum mem_cgroup_page_stat_item idx,
> + int val);
> +
> +static inline void mem_cgroup_update_page_stat(struct page *page,
> enum mem_cgroup_page_stat_item idx,
> - int val);
> + int val)
> +{
> + if (mem_cgroup_disabled())
> + return;
> + __mem_cgroup_update_page_stat(page, idx, val);
> +}
> +
> +bool __mem_cgroup_begin_update_page_stats(struct page *page,
> + unsigned long *flags);
> +static inline bool
> +mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags)
> +{
> + if (mem_cgroup_disabled())
> + return false;
> + return __mem_cgroup_begin_update_page_stats(page, flags);
> +}
> +
> +void __mem_cgroup_end_update_page_stats(struct page *page, bool locked,
> + unsigned long *flags);
> +
> +static inline void
> +mem_cgroup_end_update_page_stats(struct page *page,
> + bool locked, unsigned long *flags)
> +{
> + if (mem_cgroup_disabled())
> + return;
> + __mem_cgroup_end_update_page_stats(page, locked, flags);
> +}
>
> static inline void mem_cgroup_inc_page_stat(struct page *page,
> enum mem_cgroup_page_stat_item idx)
> @@ -171,6 +214,8 @@ void mem_cgroup_split_huge_fixup(struct page *head);
> bool mem_cgroup_bad_page_check(struct page *page);
> void mem_cgroup_print_bad_page(struct page *page);
> #endif
> +
> +
> #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> struct mem_cgroup;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d183e1b..f4e6d5c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1831,27 +1831,50 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
> * possibility of race condition. If there is, we take a lock.
> */
>
> -void mem_cgroup_update_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx, int val)
> +/*
> + * This function calls rcu_read_lock(). This lock is unlocked by
> + * __mem_cgroup_end_update_page_stat().
> + */
> +bool __mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags)
> {
> struct mem_cgroup *memcg;
> struct page_cgroup *pc = lookup_page_cgroup(page);
> bool need_unlock = false;
> - unsigned long uninitialized_var(flags);
>
> rcu_read_lock();
> memcg = pc->mem_cgroup;
> - if (unlikely(!memcg || !PageCgroupUsed(pc)))
> + if (!memcg || !PageCgroupUsed(pc))
> goto out;
> - /* pc->mem_cgroup is unstable ? */
> if (unlikely(mem_cgroup_stealed(memcg)) || PageTransHuge(page)) {
> - /* take a lock against to access pc->mem_cgroup */
> - move_lock_page_cgroup(pc, &flags);
> + move_lock_page_cgroup(pc, flags);
> need_unlock = true;
> - memcg = pc->mem_cgroup;
> - if (!memcg || !PageCgroupUsed(pc))
> - goto out;
> }
> +out:
> + return need_unlock;
> +}
> +EXPORT_SYMBOL(__mem_cgroup_begin_update_page_stats);
> +
> +void __mem_cgroup_end_update_page_stats(struct page *page, bool locked,
> + unsigned long *flags)
> +{
> + struct page_cgroup *pc;
> +
> + if (unlikely(locked)) {
> + pc = lookup_page_cgroup(page);
> + move_unlock_page_cgroup(pc, flags);
> + }
> + rcu_read_unlock();
> +}
> +EXPORT_SYMBOL(__mem_cgroup_end_update_page_stats);
> +
> +void __mem_cgroup_update_page_stat(struct page *page,
> + enum mem_cgroup_page_stat_item idx, int val)
> +{
> + struct page_cgroup *pc = lookup_page_cgroup(page);
> + struct mem_cgroup *memcg = pc->mem_cgroup;
> +
> + if (!memcg || !PageCgroupUsed(pc))
> + return;
>
> switch (idx) {
> case MEMCG_NR_FILE_MAPPED:
> @@ -1866,14 +1889,9 @@ void mem_cgroup_update_page_stat(struct page *page,
> }
>
> this_cpu_add(memcg->stat->count[idx], val);
> -
> -out:
> - if (unlikely(need_unlock))
> - move_unlock_page_cgroup(pc, &flags);
> - rcu_read_unlock();
> return;
> }
> -EXPORT_SYMBOL(mem_cgroup_update_page_stat);
> +EXPORT_SYMBOL(__mem_cgroup_update_page_stat);
>
> /*
> * size of first charge trial. "32" comes from vmscan.c's magic value.
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 54d140a..3648c88 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1105,10 +1105,15 @@ void page_add_new_anon_rmap(struct page *page,
> */
> void page_add_file_rmap(struct page *page)
> {
> + unsigned long flags;
> + bool locked;
> +
> + locked = mem_cgroup_begin_update_page_stats(page, &flags);
> if (atomic_inc_and_test(&page->_mapcount)) {
> __inc_zone_page_state(page, NR_FILE_MAPPED);
> mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
> }
> + mem_cgroup_end_update_page_stats(page, locked, &flags);
> }
>
> /**
> --
> 1.7.4.1
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-01-12 0:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-11 22:41 memcg: add mlock statistic in memory.stat Ying Han
2012-01-11 23:17 ` Hugh Dickins
2012-01-11 23:59 ` KAMEZAWA Hiroyuki
2012-01-12 0:50 ` Ying Han [this message]
2012-01-12 3:21 ` KAMEZAWA Hiroyuki
2012-01-12 19:13 ` Ying Han
2012-01-13 0:10 ` KAMEZAWA Hiroyuki
2012-01-13 22:27 ` Ying Han
2012-01-12 12:54 ` Michal Hocko
2012-01-12 19:09 ` Ying Han
2012-01-12 20:44 ` Michal Hocko
2012-01-13 22:24 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALWz4iyuT48FWuw52bcu3B9GvHbz3c3ODcsgPzOP80UOP1Q-bQ@mail.gmail.com \
--to=yinghan@google.com \
--cc=bsingharora@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).