From: Johannes Weiner <hannes@cmpxchg.org>
To: Sha Zhengju <handai.szj@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Cgroups <cgroups@vger.kernel.org>, Michal Hocko <mhocko@suse.cz>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Glauber Costa <glommer@gmail.com>,
Greg Thelen <gthelen@google.com>,
Wu Fengguang <fengguang.wu@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Sha Zhengju <handai.szj@taobao.com>
Subject: Re: [PATCH V5 7/8] memcg: don't account root memcg page stats if only root exists
Date: Mon, 5 Aug 2013 17:58:52 -0400 [thread overview]
Message-ID: <20130805215852.GF1845@cmpxchg.org> (raw)
In-Reply-To: <CAFj3OHUxir9kUXgHfOb1m6LDzO8HBG68CDi3MzV54sC0jdP=iQ@mail.gmail.com>
On Fri, Aug 02, 2013 at 12:32:17PM +0800, Sha Zhengju wrote:
> On Fri, Aug 2, 2013 at 12:20 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Thu, Aug 01, 2013 at 08:00:07PM +0800, Sha Zhengju wrote:
> >> @@ -6303,6 +6360,49 @@ mem_cgroup_css_online(struct cgroup *cont)
> >> }
> >>
> >> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> >> + if (!error) {
> >> + if (!mem_cgroup_in_use()) {
> >> + /* I'm the first non-root memcg, move global stats to root memcg.
> >> + * Memcg creating is serialized by cgroup locks(cgroup_mutex),
> >> + * so the mem_cgroup_in_use() checking is safe.
> >> + *
> >> + * We use global_page_state() to get global page stats, but
> >> + * because of the optimized inc/dec functions in SMP while
> >> + * updating each zone's stats, We may lose some numbers
> >> + * in a stock(zone->pageset->vm_stat_diff) which brings some
> >> + * inaccuracy. But places where kernel use these page stats to
> >> + * steer next decision e.g. dirty page throttling or writeback
> >> + * also use global_page_state(), so here it's enough too.
> >> + */
> >> + spin_lock(&root_mem_cgroup->pcp_counter_lock);
> >> + root_mem_cgroup->stats_base.count[MEM_CGROUP_STAT_FILE_MAPPED] =
> >> + global_page_state(NR_FILE_MAPPED);
> >> + root_mem_cgroup->stats_base.count[MEM_CGROUP_STAT_FILE_DIRTY] =
> >> + global_page_state(NR_FILE_DIRTY);
> >> + root_mem_cgroup->stats_base.count[MEM_CGROUP_STAT_WRITEBACK] =
> >> + global_page_state(NR_WRITEBACK);
> >> + spin_unlock(&root_mem_cgroup->pcp_counter_lock);
> >> + }
> >
> > If inaccuracies in these counters are okay, why do we go through an
> > elaborate locking scheme that sprinkles memcg callbacks everywhere
> > just to be 100% reliable in the rare case somebody moves memory
> > between cgroups?
>
> IMO they are not the same thing. Moving between cgroups may happen
> many times, if we ignore the inaccuracy between moving, then the
> cumulative inaccuracies caused by this back and forth behavior can
> become very large.
>
> But the transferring above occurs only once since we do it only when
> the first non-root memcg creating, so the error is at most
> zone->pageset->stat_threshold. This threshold depends on the amount of
> zone memory and the number of processors, and its maximum value is
> 125, so I thought using global_page_state() is enough. Of course we
> can add the stock to seek for greater perfection, but there are also
> possibilities of inaccuracy because of racy.
File pages may get unmapped/dirtied/put under writeback/finish
writeback between reading the stats and arming the inuse keys (before
which you are not collecting any percpu deltas).
The error is not from the percpu inaccuracies but because you don't
snapshot the counters and start accounting changes atomically wrt
ongoing counter modificatcions. This means the error is unbounded.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-08-05 21:59 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-01 11:43 [PATCH V5 0/8] Add memcg dirty/writeback page accounting Sha Zhengju
2013-08-01 11:44 ` [PATCH 1/8] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju
2013-08-01 11:51 ` [PATCH V5 2/8] fs/ceph: vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Sha Zhengju
2013-08-01 15:19 ` Yan, Zheng
2013-08-01 18:27 ` Sage Weil
2013-08-02 10:04 ` Sha Zhengju
2013-08-02 20:30 ` Sage Weil
2013-08-03 8:58 ` Sha Zhengju
2013-08-02 9:04 ` Sha Zhengju
2013-08-02 13:11 ` Yan, Zheng
2013-08-01 11:52 ` [PATCH V5 3/8] memcg: check for proper lock held in mem_cgroup_update_page_stat Sha Zhengju
2013-08-01 14:34 ` Michal Hocko
2013-08-04 18:48 ` Greg Thelen
2013-08-01 11:53 ` [PATCH V5 4/8] memcg: add per cgroup dirty pages accounting Sha Zhengju
2013-08-01 11:54 ` [PATCH V5 5/8] memcg: add per cgroup writeback " Sha Zhengju
2013-08-01 14:53 ` Michal Hocko
2013-08-03 9:25 ` Sha Zhengju
2013-08-04 10:08 ` Michal Hocko
2013-08-22 9:46 ` Fwd: " Sha Zhengju
2013-08-22 9:50 ` [PATCH 1/4] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju
2013-08-22 9:52 ` [PATCH 2/4] memcg: check for proper lock held in mem_cgroup_update_page_stat Sha Zhengju
2013-08-22 9:53 ` [PATCH 3/4] memcg: add per cgroup writeback pages accounting Sha Zhengju
2013-08-22 22:40 ` Andrew Morton
2013-08-23 16:11 ` Sha Zhengju
2013-08-22 9:53 ` [PATCH 4/4] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju
2013-08-04 18:51 ` [PATCH V5 5/8] memcg: add per cgroup writeback pages accounting Greg Thelen
2013-08-01 11:55 ` [PATCH V5 6/8] memcg: make nocpu_base available for non-hotplug Sha Zhengju
2013-08-01 12:00 ` [PATCH V5 7/8] memcg: don't account root memcg page stats if only root exists Sha Zhengju
2013-08-01 16:20 ` Johannes Weiner
2013-08-02 4:32 ` Sha Zhengju
2013-08-05 21:58 ` Johannes Weiner [this message]
2013-08-01 12:00 ` [PATCH V5 8/8] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju
2013-08-01 14:43 ` [PATCH V5 0/8] Add memcg dirty/writeback page accounting Michal Hocko
2013-08-03 9:30 ` Sha Zhengju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130805215852.GF1845@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=fengguang.wu@intel.com \
--cc=glommer@gmail.com \
--cc=gthelen@google.com \
--cc=handai.szj@gmail.com \
--cc=handai.szj@taobao.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).