All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v4 3/6] memcg: relax memcg iter caching
Date: Fri, 15 Feb 2013 17:19:32 +0900	[thread overview]
Message-ID: <511DEF94.1080709@jp.fujitsu.com> (raw)
In-Reply-To: <1360848396-16564-4-git-send-email-mhocko@suse.cz>

(2013/02/14 22:26), Michal Hocko wrote:
> Now that per-node-zone-priority iterator caches memory cgroups rather
> than their css ids we have to be careful and remove them from the
> iterator when they are on the way out otherwise they might live for
> unbounded amount of time even though their group is already gone (until
> the global/targeted reclaim triggers the zone under priority to find out
> the group is dead and let it to find the final rest).
> 
> We can fix this issue by relaxing rules for the last_visited memcg.
> Instead of taking a reference to the css before it is stored into
> iter->last_visited we can just store its pointer and track the number of
> removed groups from each memcg's subhierarchy.
> 
> This number would be stored into iterator everytime when a memcg is
> cached. If the iter count doesn't match the curent walker root's one we
> will start from the root again. The group counter is incremented upwards
> the hierarchy every time a group is removed.
> 
> The iter_lock can be dropped because racing iterators cannot leak
> the reference anymore as the reference count is not elevated for
> last_visited when it is cached.
> 
> Locking rules got a bit complicated by this change though. The iterator
> primarily relies on rcu read lock which makes sure that once we see
> a valid last_visited pointer then it will be valid for the whole RCU
> walk. smp_rmb makes sure that dead_count is read before last_visited
> and last_dead_count while smp_wmb makes sure that last_visited is
> updated before last_dead_count so the up-to-date last_dead_count cannot
> point to an outdated last_visited. css_tryget then makes sure that
> the last_visited is still alive in case the iteration races with the
> cached group removal (css is invalidated before mem_cgroup_css_offline
> increments dead_count).
> 
> In short:
> mem_cgroup_iter
>   rcu_read_lock()
>   dead_count = atomic_read(parent->dead_count)
>   smp_rmb()
>   if (dead_count != iter->last_dead_count)
>   	last_visited POSSIBLY INVALID -> last_visited = NULL
>   if (!css_tryget(iter->last_visited))
>   	last_visited DEAD -> last_visited = NULL
>   next = find_next(last_visited)
>   css_tryget(next)
>   css_put(last_visited) 	// css would be invalidated and parent->dead_count
>   			// incremented if this was the last reference
>   iter->last_visited = next
>   smp_wmb()
>   iter->last_dead_count = dead_count
>   rcu_read_unlock()
> 
> cgroup_rmdir
>   cgroup_destroy_locked
>    atomic_add(CSS_DEACT_BIAS, &css->refcnt) // subsequent css_tryget fail
>     mem_cgroup_css_offline
>      mem_cgroup_invalidate_reclaim_iterators
>       while(parent = parent_mem_cgroup)
>       	atomic_inc(parent->dead_count)
>    css_put(css) // last reference held by cgroup core
> 
> Spotted-by: Ying Han <yinghan@google.com>
> Original-idea-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

interesting. Thank you for your hard works.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v4 3/6] memcg: relax memcg iter caching
Date: Fri, 15 Feb 2013 17:19:32 +0900	[thread overview]
Message-ID: <511DEF94.1080709@jp.fujitsu.com> (raw)
In-Reply-To: <1360848396-16564-4-git-send-email-mhocko@suse.cz>

(2013/02/14 22:26), Michal Hocko wrote:
> Now that per-node-zone-priority iterator caches memory cgroups rather
> than their css ids we have to be careful and remove them from the
> iterator when they are on the way out otherwise they might live for
> unbounded amount of time even though their group is already gone (until
> the global/targeted reclaim triggers the zone under priority to find out
> the group is dead and let it to find the final rest).
> 
> We can fix this issue by relaxing rules for the last_visited memcg.
> Instead of taking a reference to the css before it is stored into
> iter->last_visited we can just store its pointer and track the number of
> removed groups from each memcg's subhierarchy.
> 
> This number would be stored into iterator everytime when a memcg is
> cached. If the iter count doesn't match the curent walker root's one we
> will start from the root again. The group counter is incremented upwards
> the hierarchy every time a group is removed.
> 
> The iter_lock can be dropped because racing iterators cannot leak
> the reference anymore as the reference count is not elevated for
> last_visited when it is cached.
> 
> Locking rules got a bit complicated by this change though. The iterator
> primarily relies on rcu read lock which makes sure that once we see
> a valid last_visited pointer then it will be valid for the whole RCU
> walk. smp_rmb makes sure that dead_count is read before last_visited
> and last_dead_count while smp_wmb makes sure that last_visited is
> updated before last_dead_count so the up-to-date last_dead_count cannot
> point to an outdated last_visited. css_tryget then makes sure that
> the last_visited is still alive in case the iteration races with the
> cached group removal (css is invalidated before mem_cgroup_css_offline
> increments dead_count).
> 
> In short:
> mem_cgroup_iter
>   rcu_read_lock()
>   dead_count = atomic_read(parent->dead_count)
>   smp_rmb()
>   if (dead_count != iter->last_dead_count)
>   	last_visited POSSIBLY INVALID -> last_visited = NULL
>   if (!css_tryget(iter->last_visited))
>   	last_visited DEAD -> last_visited = NULL
>   next = find_next(last_visited)
>   css_tryget(next)
>   css_put(last_visited) 	// css would be invalidated and parent->dead_count
>   			// incremented if this was the last reference
>   iter->last_visited = next
>   smp_wmb()
>   iter->last_dead_count = dead_count
>   rcu_read_unlock()
> 
> cgroup_rmdir
>   cgroup_destroy_locked
>    atomic_add(CSS_DEACT_BIAS, &css->refcnt) // subsequent css_tryget fail
>     mem_cgroup_css_offline
>      mem_cgroup_invalidate_reclaim_iterators
>       while(parent = parent_mem_cgroup)
>       	atomic_inc(parent->dead_count)
>    css_put(css) // last reference held by cgroup core
> 
> Spotted-by: Ying Han <yinghan@google.com>
> Original-idea-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

interesting. Thank you for your hard works.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



  parent reply	other threads:[~2013-02-15  8:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-14 13:26 [PATCH v4 -mm] rework mem_cgroup iterator Michal Hocko
2013-02-14 13:26 ` Michal Hocko
2013-02-14 13:26 ` [PATCH v4 1/6] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2013-02-14 13:26   ` Michal Hocko
2013-02-14 13:26 ` [PATCH v4 2/6] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2013-02-14 13:26   ` Michal Hocko
2013-02-15  8:03   ` Kamezawa Hiroyuki
2013-02-15  8:03     ` Kamezawa Hiroyuki
2013-02-15  8:11     ` Michal Hocko
2013-02-15  8:11       ` Michal Hocko
2013-02-15  8:13       ` Kamezawa Hiroyuki
2013-02-15  8:13         ` Kamezawa Hiroyuki
2013-02-14 13:26 ` [PATCH v4 3/6] memcg: relax memcg iter caching Michal Hocko
2013-02-14 13:26   ` Michal Hocko
2013-02-15  8:08   ` Michal Hocko
2013-02-15  8:08     ` Michal Hocko
2013-02-15  8:19   ` Kamezawa Hiroyuki [this message]
2013-02-15  8:19     ` Kamezawa Hiroyuki
2013-02-14 13:26 ` [PATCH v4 4/6] memcg: simplify mem_cgroup_iter Michal Hocko
2013-02-14 13:26   ` Michal Hocko
2013-02-15  8:24   ` Kamezawa Hiroyuki
2013-02-15  8:24     ` Kamezawa Hiroyuki
2013-02-14 13:26 ` [PATCH v4 5/6] memcg: further " Michal Hocko
2013-02-14 13:26   ` Michal Hocko
2013-02-14 13:26 ` [PATCH v4 6/6] cgroup: remove css_get_next Michal Hocko
2013-02-14 13:26   ` Michal Hocko
2013-03-13 15:08 ` [PATCH v4 -mm] rework mem_cgroup iterator Michal Hocko
2013-03-13 15:08   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511DEF94.1080709@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=htejun@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.