linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Sha Zhengju <handai.szj@gmail.com>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz,
	akpm@linux-foundation.org, gthelen@google.com,
	hannes@cmpxchg.org, hughd@google.com,
	Sha Zhengju <handai.szj@taobao.com>
Subject: Re: [PATCH] memcg: simplify lock of memcg page stat accounting
Date: Tue, 29 Jan 2013 09:41:05 +0900	[thread overview]
Message-ID: <51071AA1.7000207@jp.fujitsu.com> (raw)
In-Reply-To: <1359198756-3752-1-git-send-email-handai.szj@taobao.com>

(2013/01/26 20:12), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
> 
> After removing duplicated information like PCG_*
> flags in 'struct page_cgroup'(commit 2ff76f1193), there's a problem
> between "move" and "page stat accounting"(only FILE_MAPPED is supported
> now but other stats will be added in future):
> assume CPU-A does "page stat accounting" and CPU-B does "move"
> 
> CPU-A                        CPU-B
> TestSet PG_dirty
> (delay)              	move_lock_mem_cgroup()
>                          if (PageDirty(page)) {
>                               old_memcg->nr_dirty --
>                               new_memcg->nr_dirty++
>                          }
>                          pc->mem_cgroup = new_memcg;
>                          move_unlock_mem_cgroup()
> 
> move_lock_mem_cgroup()
> memcg = pc->mem_cgroup
> memcg->nr_dirty++
> move_unlock_mem_cgroup()
> 
> while accounting information of new_memcg may be double-counted. So we
> use a bigger lock to solve this problem:  (commit: 89c06bd52f)
> 
>        move_lock_mem_cgroup() <-- mem_cgroup_begin_update_page_stat()
>        TestSetPageDirty(page)
>        update page stats (without any checks)
>        move_unlock_mem_cgroup() <-- mem_cgroup_begin_update_page_stat()
> 
> 
> But this method also has its pros and cons: at present we use two layers
> of lock avoidance(memcg_moving and memcg->moving_account) then spinlock
> on memcg (see mem_cgroup_begin_update_page_stat()), but the lock granularity
> is a little bigger that not only the critical section but also some code
> logic is in the range of locking which may be deadlock prone. As dirty
> writeack stats are added, it gets into further difficulty with the page
> cache radix tree lock and it seems that the lock requires nesting.
> (https://lkml.org/lkml/2013/1/2/48)
> 
> So in order to make the lock simpler and clearer and also avoid the 'nesting'
> problem, a choice may be:
> (CPU-A does "page stat accounting" and CPU-B does "move")
> 
>         CPU-A                        CPU-B
> 
> move_lock_mem_cgroup()
> memcg = pc->mem_cgroup
> TestSetPageDirty(page)
> move_unlock_mem_cgroup()
>                               move_lock_mem_cgroup()
>                               if (PageDirty) {
>                                    old_memcg->nr_dirty --;
>                                    new_memcg->nr_dirty ++;
>                               }
>                               pc->mem_cgroup = new_memcg
>                               move_unlock_mem_cgroup()
> 
> memcg->nr_dirty ++
> 

Hmm. no race with file truncate ?


> 
> For CPU-A, we save pc->mem_cgroup in a temporary variable just before
> TestSetPageDirty inside move_lock and then update stats if the page is set
> PG_dirty successfully. But CPU-B may do "moving" in advance that
> "old_memcg->nr_dirty --" will make old_memcg->nr_dirty incorrect but
> soon CPU-A will do "memcg->nr_dirty ++" finally that amend the stats.
> 
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> ---
>   include/linux/memcontrol.h |   14 +++++------
>   mm/memcontrol.c            |    8 ++-----
>   mm/rmap.c                  |   55 +++++++++++++++++++++++++++++++++-----------
>   3 files changed, 51 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0108a56..12de53b 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -164,20 +164,20 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
>   	rcu_read_unlock();
>   }
>   
> -void mem_cgroup_update_page_stat(struct page *page,
> +void mem_cgroup_update_page_stat(struct mem_cgroup *memcg,
>   				 enum mem_cgroup_page_stat_item idx,
>   				 int val);
>   
> -static inline void mem_cgroup_inc_page_stat(struct page *page,
> +static inline void mem_cgroup_inc_page_stat(struct mem_cgroup *memcg,
>   					    enum mem_cgroup_page_stat_item idx)
>   {
> -	mem_cgroup_update_page_stat(page, idx, 1);
> +	mem_cgroup_update_page_stat(memcg, idx, 1);
>   }
>   
> -static inline void mem_cgroup_dec_page_stat(struct page *page,
> +static inline void mem_cgroup_dec_page_stat(struct mem_cgroup *memcg,
>   					    enum mem_cgroup_page_stat_item idx)
>   {
> -	mem_cgroup_update_page_stat(page, idx, -1);
> +	mem_cgroup_update_page_stat(memcg, idx, -1);
>   }
>   
>   unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> @@ -354,12 +354,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
>   {
>   }
>   
> -static inline void mem_cgroup_inc_page_stat(struct page *page,
> +static inline void mem_cgroup_inc_page_stat(struct mem_cgroup *memcg,
>   					    enum mem_cgroup_page_stat_item idx)
>   {
>   }
>   
> -static inline void mem_cgroup_dec_page_stat(struct page *page,
> +static inline void mem_cgroup_dec_page_stat(struct mem_cgroup *memcg,
>   					    enum mem_cgroup_page_stat_item idx)
>   {
>   }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3817460..1b13e43 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2259,18 +2259,14 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
>   	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
>   }
>   
> -void mem_cgroup_update_page_stat(struct page *page,
> +void mem_cgroup_update_page_stat(struct mem_cgroup *memcg,
>   				 enum mem_cgroup_page_stat_item idx, int val)
>   {
> -	struct mem_cgroup *memcg;
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
> -	unsigned long uninitialized_var(flags);
>   
>   	if (mem_cgroup_disabled())
>   		return;
>   
> -	memcg = pc->mem_cgroup;
> -	if (unlikely(!memcg || !PageCgroupUsed(pc)))
> +	if (unlikely(!memcg))
>   		return;

I can't catch why you can do accounting without checking PCG_USED.
Could you add comments like

  * while accounting ops, mapping->tree_lock() or lock_page() is held
    and we have any race with truncation 
  etc...

>   
>   	switch (idx) {
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 59b0dca..0d74c48 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1112,13 +1112,25 @@ void page_add_file_rmap(struct page *page)
>   {
>   	bool locked;
>   	unsigned long flags;
> +	bool ret;
> +	struct mem_cgroup *memcg = NULL;
> +	struct cgroup_subsys_state *css = NULL;
>   
>   	mem_cgroup_begin_update_page_stat(page, &locked, &flags);
> -	if (atomic_inc_and_test(&page->_mapcount)) {
> +	memcg = try_get_mem_cgroup_from_page(page);

Toooooo heavy ! I can say NACK to this patch only because of this try_get().

To hold memcg alive, rcu_read_lock() will work (as current code does).

BTW, does this patch fixes the nested-lock problem ?

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-01-29  0:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 11:12 [PATCH] memcg: simplify lock of memcg page stat accounting Sha Zhengju
2013-01-28 14:10 ` Michal Hocko
2013-01-29 13:44   ` Sha Zhengju
2013-01-29 15:19     ` Michal Hocko
2013-01-29  0:41 ` Kamezawa Hiroyuki [this message]
2013-01-29 10:40   ` Michal Hocko
2013-01-29 15:29   ` Sha Zhengju
2013-01-30  9:12     ` Michal Hocko
2013-01-30 14:57       ` Sha Zhengju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51071AA1.7000207@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=handai.szj@gmail.com \
    --cc=handai.szj@taobao.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).