From: Michal Hocko <mhocko@suse.cz>
To: Ying Han <yinghan@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Tejun Heo <htejun@gmail.com>,
Glauber Costa <glommer@parallels.com>,
Li Zefan <lizefan@huawei.com>
Subject: Re: [patch v2 3/6] memcg: rework mem_cgroup_iter to use cgroup iterators
Date: Fri, 14 Dec 2012 13:37:38 +0100 [thread overview]
Message-ID: <20121214123738.GH6898@dhcp22.suse.cz> (raw)
In-Reply-To: <20121212192441.GD10374@dhcp22.suse.cz>
On Wed 12-12-12 20:24:41, Michal Hocko wrote:
> On Wed 12-12-12 10:06:52, Michal Hocko wrote:
> > On Tue 11-12-12 14:36:10, Ying Han wrote:
> [...]
> > > One exception is mem_cgroup_iter_break(), where the loop terminates
> > > with *leaked* refcnt and that is what the iter_break() needs to clean
> > > up. We can not rely on the next caller of the loop since it might
> > > never happen.
> >
> > Yes, this is true and I already have a half baked patch for that. I
> > haven't posted it yet but it basically checks all node-zone-prio
> > last_visited and removes itself from them on the way out in pre_destroy
> > callback (I just need to cleanup "find a new last_visited" part and will
> > post it).
>
> And a half baked patch - just compile tested
please ignore this patch. It is totally bogus.
> ---
> From 1c976c079c383175c679e00115aee0ab8e215bf2 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 11 Dec 2012 21:02:39 +0100
> Subject: [PATCH] NOT READY YET - just compile tested
>
> memcg: remove memcg from the reclaim iterators
>
> Now that per-node-zone-priority iterator caches memory cgroups rather
> than their css ids we have to be careful and remove them from the
> iterator when they are on the way out otherwise they might hang for
> unbounded amount of time (until the global reclaim triggers the zone
> under priority to find out the group is dead and let it to find the
> final rest).
>
> This is solved by hooking into mem_cgroup_pre_destroy and checking all
> per-node-zone-priority iterators. If the current memcg is found in
> iter->last_visited then it is replaced by its left sibling or its parent
> otherwise. This guarantees that no group gets more reclaiming than
> necessary and the next iteration will continue seemingly.
>
> Spotted-by: Ying Han <yinghan@google.com>
> Not-signed-off-by-yet: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7134148..286db74 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6213,12 +6213,50 @@ free_out:
> return ERR_PTR(error);
> }
>
> +static void mem_cgroup_remove_cached(struct mem_cgroup *memcg)
> +{
> + int node, zone;
> +
> + for_each_node(node) {
> + struct mem_cgroup_per_node *pn = memcg->info.nodeinfo[node];
> + int prio;
> +
> + for (zone = 0; zone < MAX_NR_ZONES; zone++) {
> + struct mem_cgroup_per_zone *mz;
> +
> + mz = &pn->zoneinfo[zone];
> + for (prio = 0; prio < DEF_PRIORITY + 1; prio++) {
> + struct mem_cgroup_reclaim_iter *iter;
> +
> + iter = &mz->reclaim_iter[prio];
> + rcu_read_lock();
> + spin_lock(&iter->iter_lock);
> + if (iter->last_visited == memcg) {
> + struct cgroup *cgroup, *prev;
> +
> + cgroup = memcg->css.cgroup;
> + prev = list_entry_rcu(cgroup->sibling.prev, struct cgroup, sibling);
> + if (&prev->sibling == &prev->parent->children)
> + prev = prev->parent;
> + iter->last_visited = mem_cgroup_from_cont(prev);
> +
> + /* TODO can we do this? */
> + css_put(&memcg->css);
> + }
> + spin_unlock(&iter->iter_lock);
> + rcu_read_unlock();
> + }
> + }
> + }
> +}
> +
> static void mem_cgroup_pre_destroy(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> mem_cgroup_reparent_charges(memcg);
> mem_cgroup_destroy_all_caches(memcg);
> + mem_cgroup_remove_cached(memcg);
> }
>
> static void mem_cgroup_destroy(struct cgroup *cont)
> --
> 1.7.10.4
>
> --
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Ying Han <yinghan@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Tejun Heo <htejun@gmail.com>,
Glauber Costa <glommer@parallels.com>,
Li Zefan <lizefan@huawei.com>
Subject: Re: [patch v2 3/6] memcg: rework mem_cgroup_iter to use cgroup iterators
Date: Fri, 14 Dec 2012 13:37:38 +0100 [thread overview]
Message-ID: <20121214123738.GH6898@dhcp22.suse.cz> (raw)
In-Reply-To: <20121212192441.GD10374@dhcp22.suse.cz>
On Wed 12-12-12 20:24:41, Michal Hocko wrote:
> On Wed 12-12-12 10:06:52, Michal Hocko wrote:
> > On Tue 11-12-12 14:36:10, Ying Han wrote:
> [...]
> > > One exception is mem_cgroup_iter_break(), where the loop terminates
> > > with *leaked* refcnt and that is what the iter_break() needs to clean
> > > up. We can not rely on the next caller of the loop since it might
> > > never happen.
> >
> > Yes, this is true and I already have a half baked patch for that. I
> > haven't posted it yet but it basically checks all node-zone-prio
> > last_visited and removes itself from them on the way out in pre_destroy
> > callback (I just need to cleanup "find a new last_visited" part and will
> > post it).
>
> And a half baked patch - just compile tested
please ignore this patch. It is totally bogus.
> ---
> From 1c976c079c383175c679e00115aee0ab8e215bf2 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 11 Dec 2012 21:02:39 +0100
> Subject: [PATCH] NOT READY YET - just compile tested
>
> memcg: remove memcg from the reclaim iterators
>
> Now that per-node-zone-priority iterator caches memory cgroups rather
> than their css ids we have to be careful and remove them from the
> iterator when they are on the way out otherwise they might hang for
> unbounded amount of time (until the global reclaim triggers the zone
> under priority to find out the group is dead and let it to find the
> final rest).
>
> This is solved by hooking into mem_cgroup_pre_destroy and checking all
> per-node-zone-priority iterators. If the current memcg is found in
> iter->last_visited then it is replaced by its left sibling or its parent
> otherwise. This guarantees that no group gets more reclaiming than
> necessary and the next iteration will continue seemingly.
>
> Spotted-by: Ying Han <yinghan@google.com>
> Not-signed-off-by-yet: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7134148..286db74 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6213,12 +6213,50 @@ free_out:
> return ERR_PTR(error);
> }
>
> +static void mem_cgroup_remove_cached(struct mem_cgroup *memcg)
> +{
> + int node, zone;
> +
> + for_each_node(node) {
> + struct mem_cgroup_per_node *pn = memcg->info.nodeinfo[node];
> + int prio;
> +
> + for (zone = 0; zone < MAX_NR_ZONES; zone++) {
> + struct mem_cgroup_per_zone *mz;
> +
> + mz = &pn->zoneinfo[zone];
> + for (prio = 0; prio < DEF_PRIORITY + 1; prio++) {
> + struct mem_cgroup_reclaim_iter *iter;
> +
> + iter = &mz->reclaim_iter[prio];
> + rcu_read_lock();
> + spin_lock(&iter->iter_lock);
> + if (iter->last_visited == memcg) {
> + struct cgroup *cgroup, *prev;
> +
> + cgroup = memcg->css.cgroup;
> + prev = list_entry_rcu(cgroup->sibling.prev, struct cgroup, sibling);
> + if (&prev->sibling == &prev->parent->children)
> + prev = prev->parent;
> + iter->last_visited = mem_cgroup_from_cont(prev);
> +
> + /* TODO can we do this? */
> + css_put(&memcg->css);
> + }
> + spin_unlock(&iter->iter_lock);
> + rcu_read_unlock();
> + }
> + }
> + }
> +}
> +
> static void mem_cgroup_pre_destroy(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> mem_cgroup_reparent_charges(memcg);
> mem_cgroup_destroy_all_caches(memcg);
> + mem_cgroup_remove_cached(memcg);
> }
>
> static void mem_cgroup_destroy(struct cgroup *cont)
> --
> 1.7.10.4
>
> --
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2012-12-14 12:37 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-26 18:47 rework mem_cgroup iterator Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-26 18:47 ` [patch v2 1/6] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-26 18:47 ` [patch v2 2/6] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-28 8:38 ` Kamezawa Hiroyuki
2012-11-28 8:38 ` Kamezawa Hiroyuki
2012-11-26 18:47 ` [patch v2 3/6] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-28 8:47 ` Kamezawa Hiroyuki
2012-11-28 8:47 ` Kamezawa Hiroyuki
2012-11-28 9:17 ` Michal Hocko
2012-11-28 9:17 ` Michal Hocko
2012-11-28 9:23 ` Glauber Costa
2012-11-28 9:23 ` Glauber Costa
2012-11-28 9:33 ` Michal Hocko
2012-11-28 9:33 ` Michal Hocko
2012-11-28 9:35 ` Glauber Costa
2012-11-28 9:35 ` Glauber Costa
2012-11-30 4:07 ` Kamezawa Hiroyuki
2012-11-30 4:07 ` Kamezawa Hiroyuki
2012-12-07 3:39 ` Ying Han
2012-12-07 3:39 ` Ying Han
2012-12-07 3:43 ` Ying Han
2012-12-07 3:43 ` Ying Han
2012-12-07 8:58 ` Michal Hocko
2012-12-07 8:58 ` Michal Hocko
2012-12-07 17:12 ` Ying Han
2012-12-07 17:12 ` Ying Han
2012-12-07 17:27 ` Michal Hocko
2012-12-07 17:27 ` Michal Hocko
2012-12-07 19:16 ` Ying Han
2012-12-07 19:16 ` Ying Han
2012-12-07 19:35 ` Michal Hocko
2012-12-07 19:35 ` Michal Hocko
2012-12-07 9:01 ` Michal Hocko
2012-12-07 9:01 ` Michal Hocko
2012-12-09 16:59 ` Ying Han
2012-12-09 16:59 ` Ying Han
2012-12-11 15:50 ` Michal Hocko
2012-12-11 15:50 ` Michal Hocko
2012-12-11 16:15 ` Michal Hocko
2012-12-11 16:15 ` Michal Hocko
2012-12-11 18:10 ` Michal Hocko
2012-12-11 18:10 ` Michal Hocko
2012-12-11 22:43 ` Ying Han
2012-12-11 22:43 ` Ying Han
2012-12-12 8:55 ` Michal Hocko
2012-12-12 8:55 ` Michal Hocko
2012-12-12 17:57 ` Ying Han
2012-12-12 17:57 ` Ying Han
2012-12-12 18:08 ` Michal Hocko
2012-12-12 18:08 ` Michal Hocko
2012-12-11 22:31 ` Ying Han
2012-12-11 22:31 ` Ying Han
2012-12-09 19:39 ` Ying Han
2012-12-09 19:39 ` Ying Han
2012-12-11 15:54 ` Michal Hocko
2012-12-11 15:54 ` Michal Hocko
2012-12-11 22:36 ` Ying Han
2012-12-11 22:36 ` Ying Han
2012-12-12 9:06 ` Michal Hocko
2012-12-12 9:06 ` Michal Hocko
2012-12-12 18:09 ` Ying Han
2012-12-12 18:09 ` Ying Han
2012-12-12 18:34 ` Michal Hocko
2012-12-12 18:34 ` Michal Hocko
2012-12-12 18:42 ` Michal Hocko
2012-12-12 18:42 ` Michal Hocko
2012-12-14 1:06 ` Ying Han
2012-12-14 1:06 ` Ying Han
2012-12-14 10:56 ` [PATCH] memcg,vmscan: do not break out targeted reclaim without reclaimed pages Michal Hocko
2012-12-14 10:56 ` Michal Hocko
2012-12-12 19:24 ` [patch v2 3/6] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2012-12-12 19:24 ` Michal Hocko
2012-12-14 1:14 ` Ying Han
2012-12-14 1:14 ` Ying Han
2012-12-14 12:07 ` Michal Hocko
2012-12-14 12:07 ` Michal Hocko
2012-12-14 23:08 ` Ying Han
2012-12-14 23:08 ` Ying Han
2012-12-14 12:37 ` Michal Hocko [this message]
2012-12-14 12:37 ` Michal Hocko
2012-11-26 18:47 ` [patch v2 4/6] memcg: simplify mem_cgroup_iter Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-28 8:52 ` Kamezawa Hiroyuki
2012-11-28 8:52 ` Kamezawa Hiroyuki
2012-11-30 4:09 ` Kamezawa Hiroyuki
2012-11-30 4:09 ` Kamezawa Hiroyuki
2012-12-09 17:01 ` Ying Han
2012-12-09 17:01 ` Ying Han
2012-12-11 15:57 ` Michal Hocko
2012-12-11 15:57 ` Michal Hocko
2012-12-11 4:35 ` Ying Han
2012-12-11 4:35 ` Ying Han
2012-12-11 16:01 ` Michal Hocko
2012-12-11 16:01 ` Michal Hocko
2012-12-11 22:52 ` Ying Han
2012-12-11 22:52 ` Ying Han
2012-11-26 18:47 ` [patch v2 5/6] memcg: further " Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-30 4:10 ` Kamezawa Hiroyuki
2012-11-30 4:10 ` Kamezawa Hiroyuki
2012-11-30 9:08 ` Glauber Costa
2012-11-30 9:08 ` Glauber Costa
2012-11-30 10:23 ` Michal Hocko
2012-11-30 10:23 ` Michal Hocko
2012-11-26 18:47 ` [patch v2 6/6] cgroup: remove css_get_next Michal Hocko
2012-11-26 18:47 ` Michal Hocko
2012-11-30 4:12 ` Kamezawa Hiroyuki
2012-11-30 4:12 ` Kamezawa Hiroyuki
2012-11-30 8:18 ` Michal Hocko
2012-11-30 8:18 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121214123738.GH6898@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=htejun@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.