From: Johannes Weiner <hannes@cmpxchg.org>
To: Ying Han <yinghan@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>, Mel Gorman <mel@csn.ul.ie>,
Rik van Riel <riel@redhat.com>, Hillf Danton <dhillf@gmail.com>,
Hugh Dickins <hughd@google.com>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
linux-mm@kvack.org
Subject: Re: [PATCH V2] memcg: add mlock statistic in memory.stat
Date: Fri, 20 Apr 2012 01:04:21 +0200 [thread overview]
Message-ID: <20120419230421.GC2536@cmpxchg.org> (raw)
In-Reply-To: <CALWz4iybnje0n4BODkOUYmUbzhJHhwhN4KC8RAYfpi0ppBickw@mail.gmail.com>
On Thu, Apr 19, 2012 at 03:46:08PM -0700, Ying Han wrote:
> On Thu, Apr 19, 2012 at 6:12 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Thu, Apr 19, 2012 at 09:59:20AM +0900, KAMEZAWA Hiroyuki wrote:
> >> (2012/04/19 8:33), Andrew Morton wrote:
> >>
> >> > On Wed, 18 Apr 2012 11:21:55 -0700
> >> > Ying Han <yinghan@google.com> wrote:
> >> >> static void __free_pages_ok(struct page *page, unsigned int order)
> >> >> {
> >> >> unsigned long flags;
> >> >> - int wasMlocked = __TestClearPageMlocked(page);
> >> >> + bool locked;
> >> >>
> >> >> if (!free_pages_prepare(page, order))
> >> >> return;
> >> >>
> >> >> local_irq_save(flags);
> >> >> - if (unlikely(wasMlocked))
> >> >> + mem_cgroup_begin_update_page_stat(page, &locked, &flags);
> >> >
> >> > hm, what's going on here. The page now has a zero refcount and is to
> >> > be returned to the buddy. But mem_cgroup_begin_update_page_stat()
> >> > assumes that the page still belongs to a memcg. I'd have thought that
> >> > any page_cgroup backreferences would have been torn down by now?
> >> >
> >> >> + if (unlikely(__TestClearPageMlocked(page)))
> >> >> free_page_mlock(page);
> >> >
> >>
> >>
> >> Ah, this is problem. Now, we have following code.
> >> ==
> >>
> >> > struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
> >> > enum lru_list lru)
> >> > {
> >> > struct mem_cgroup_per_zone *mz;
> >> > struct mem_cgroup *memcg;
> >> > struct page_cgroup *pc;
> >> >
> >> > if (mem_cgroup_disabled())
> >> > return &zone->lruvec;
> >> >
> >> > pc = lookup_page_cgroup(page);
> >> > memcg = pc->mem_cgroup;
> >> >
> >> > /*
> >> > * Surreptitiously switch any uncharged page to root:
> >> > * an uncharged page off lru does nothing to secure
> >> > * its former mem_cgroup from sudden removal.
> >> > *
> >> > * Our caller holds lru_lock, and PageCgroupUsed is updated
> >> > * under page_cgroup lock: between them, they make all uses
> >> > * of pc->mem_cgroup safe.
> >> > */
> >> > if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
> >> > pc->mem_cgroup = memcg = root_mem_cgroup;
> >>
> >> ==
> >>
> >> Then, accessing pc->mem_cgroup without checking PCG_USED bit is dangerous.
> >> It may trigger #GP because of suddern removal of memcg or because of above
> >> code, mis-accounting will happen... pc->mem_cgroup may be overwritten already.
> >>
> >> Proposal from me is calling TestClearPageMlocked(page) via mem_cgroup_uncharge().
> >>
> >> Like this.
> >> ==
> >> mem_cgroup_charge_statistics(memcg, anon, -nr_pages);
> >>
> >> /*
> >> * Pages reach here when it's fully unmapped or dropped from file cache.
> >> * we are under lock_page_cgroup() and have no race with memcg activities.
> >> */
> >> if (unlikely(PageMlocked(page))) {
> >> if (TestClearPageMlocked())
> >> decrement counter.
> >> }
> >>
> >> ClearPageCgroupUsed(pc);
> >> ==
> >> But please check performance impact...
> >
> > This makes the lifetime rules of mlocked anon really weird.
> >
> > Plus this code runs for ALL uncharges, the unlikely() and preliminary
> > flag testing don't make it okay. It's bad that we have this in the
> > allocator, but at least it would be good to hook into that branch and
> > not add another one.
>
> Johannes,
> Can you give a more details of your last sentence above? :)
It's a fast path for all pages at the end of their lifetime. Mlocked
anon pages that reach here are a tiny small fraction of them [it's
just those pages that race with reclaim and lazy-mlock while being
unmapped], so I think we should do our very best to not add any checks
for them here, not even with a lot of mitigation. It just seems badly
misplaced.
On the other hand, we already HAVE a branch to deal with them, in the
page allocator. We can, and should, hook into that instead.
> > pc->mem_cgroup stays intact after the uncharge. Could we make the
> > memcg removal path wait on the mlock counter to drop to zero instead
> > and otherwise keep Ying's version?
>
> Will it delay the memcg predestroy ? I am wondering if we have page in
> mmu gather or pagevec, and they won't be freed until we flush?
Can we flush them from the waiting site?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-04-19 23:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-18 18:21 [PATCH V2] memcg: add mlock statistic in memory.stat Ying Han
2012-04-18 23:33 ` Andrew Morton
2012-04-19 0:59 ` KAMEZAWA Hiroyuki
2012-04-19 13:12 ` Johannes Weiner
2012-04-19 22:46 ` Ying Han
2012-04-19 23:04 ` Johannes Weiner [this message]
2012-04-20 0:37 ` KAMEZAWA Hiroyuki
2012-04-20 5:57 ` Ying Han
2012-04-20 6:16 ` KAMEZAWA Hiroyuki
2012-04-20 6:39 ` Ying Han
2012-04-20 6:52 ` KAMEZAWA Hiroyuki
2012-04-19 22:43 ` Ying Han
2012-04-19 22:30 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120419230421.GC2536@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=dan.magenheimer@oracle.com \
--cc=dhillf@gmail.com \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.