From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
Glauber Costa <glommer@parallels.com>,
Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Mon, 11 Feb 2013 12:56:19 -0500 [thread overview]
Message-ID: <20130211175619.GC13218@cmpxchg.org> (raw)
In-Reply-To: <20130211151649.GD19922@dhcp22.suse.cz>
On Mon, Feb 11, 2013 at 04:16:49PM +0100, Michal Hocko wrote:
> On Fri 08-02-13 14:33:18, Johannes Weiner wrote:
> [...]
> > for each in hierarchy:
> > for each node:
> > for each zone:
> > for each reclaim priority:
> >
> > every time a cgroup is destroyed. I don't think such a hammer is
> > justified in general, let alone for consolidating code a little.
> >
> > Can we invalidate the position cache lazily? Have a global "cgroup
> > destruction" counter and store a snapshot of that counter whenever we
> > put a cgroup pointer in the position cache. We only use the cached
> > pointer if that counter has not changed in the meantime, so we know
> > that the cgroup still exists.
>
> Currently we have:
> rcu_read_lock() // keeps cgroup links safe
> iter->iter_lock // keeps selection exclusive for a specific iterator
> 1) global_counter == iter_counter
> 2) css_tryget(cached_memcg) // check it is still alive
> rcu_read_unlock()
>
> What would protect us from races when css would disappear between 1 and
> 2?
rcu
> css is invalidated from worker context scheduled from __css_put and it
> is using dentry locking which we surely do not want to pull here. We
> could hook into css_offline which is called with cgroup_mutex but we
> cannot use this one here because it is no longer exported and Tejun
> would kill us for that.
> So we can add a new global memcg internal lock to do this atomically.
> Ohh, this is getting uglier...
A racing final css_put() means that the tryget fails, but our RCU read
lock keeps the CSS allocated. If the dead_count is uptodate, it means
that the rcu read lock was acquired before the synchronize_rcu()
before the css is freed.
> > It is pretty pretty imprecise and we invalidate the whole cache every
> > time a cgroup is destroyed, but I think that should be okay.
>
> I am not sure this is OK because this gives an indirect way of
> influencing reclaim in one hierarchy by another one which opens a door
> for regressions (or malicious over-reclaim in the extreme case).
> So I do not like this very much.
>
> > If not, better ideas are welcome.
>
> Maybe we could keep the counter per memcg but that would mean that we
> would need to go up the hierarchy as well. We wouldn't have to go over
> node-zone-priority cleanup so it would be much more lightweight.
>
> I am not sure this is necessarily better than explicit cleanup because
> it brings yet another kind of generation number to the game but I guess
> I can live with it if people really thing the relaxed way is much
> better.
> What do you think about the patch below (untested yet)?
Better, but I think you can get rid of both locks:
mem_cgroup_iter:
rcu_read_lock()
if atomic_read(&root->dead_count) == iter->dead_count:
smp_rmb()
if tryget(iter->position):
position = iter->position
memcg = find_next(postion)
css_put(position)
iter->position = memcg
smp_wmb() /* Write position cache BEFORE marking it uptodate */
iter->dead_count = atomic_read(&root->dead_count)
rcu_read_unlock()
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
Glauber Costa <glommer@parallels.com>,
Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Mon, 11 Feb 2013 12:56:19 -0500 [thread overview]
Message-ID: <20130211175619.GC13218@cmpxchg.org> (raw)
In-Reply-To: <20130211151649.GD19922@dhcp22.suse.cz>
On Mon, Feb 11, 2013 at 04:16:49PM +0100, Michal Hocko wrote:
> On Fri 08-02-13 14:33:18, Johannes Weiner wrote:
> [...]
> > for each in hierarchy:
> > for each node:
> > for each zone:
> > for each reclaim priority:
> >
> > every time a cgroup is destroyed. I don't think such a hammer is
> > justified in general, let alone for consolidating code a little.
> >
> > Can we invalidate the position cache lazily? Have a global "cgroup
> > destruction" counter and store a snapshot of that counter whenever we
> > put a cgroup pointer in the position cache. We only use the cached
> > pointer if that counter has not changed in the meantime, so we know
> > that the cgroup still exists.
>
> Currently we have:
> rcu_read_lock() // keeps cgroup links safe
> iter->iter_lock // keeps selection exclusive for a specific iterator
> 1) global_counter == iter_counter
> 2) css_tryget(cached_memcg) // check it is still alive
> rcu_read_unlock()
>
> What would protect us from races when css would disappear between 1 and
> 2?
rcu
> css is invalidated from worker context scheduled from __css_put and it
> is using dentry locking which we surely do not want to pull here. We
> could hook into css_offline which is called with cgroup_mutex but we
> cannot use this one here because it is no longer exported and Tejun
> would kill us for that.
> So we can add a new global memcg internal lock to do this atomically.
> Ohh, this is getting uglier...
A racing final css_put() means that the tryget fails, but our RCU read
lock keeps the CSS allocated. If the dead_count is uptodate, it means
that the rcu read lock was acquired before the synchronize_rcu()
before the css is freed.
> > It is pretty pretty imprecise and we invalidate the whole cache every
> > time a cgroup is destroyed, but I think that should be okay.
>
> I am not sure this is OK because this gives an indirect way of
> influencing reclaim in one hierarchy by another one which opens a door
> for regressions (or malicious over-reclaim in the extreme case).
> So I do not like this very much.
>
> > If not, better ideas are welcome.
>
> Maybe we could keep the counter per memcg but that would mean that we
> would need to go up the hierarchy as well. We wouldn't have to go over
> node-zone-priority cleanup so it would be much more lightweight.
>
> I am not sure this is necessarily better than explicit cleanup because
> it brings yet another kind of generation number to the game but I guess
> I can live with it if people really thing the relaxed way is much
> better.
> What do you think about the patch below (untested yet)?
Better, but I think you can get rid of both locks:
mem_cgroup_iter:
rcu_read_lock()
if atomic_read(&root->dead_count) == iter->dead_count:
smp_rmb()
if tryget(iter->position):
position = iter->position
memcg = find_next(postion)
css_put(position)
iter->position = memcg
smp_wmb() /* Write position cache BEFORE marking it uptodate */
iter->dead_count = atomic_read(&root->dead_count)
rcu_read_unlock()
next prev parent reply other threads:[~2013-02-11 17:56 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-03 17:54 [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 1/7] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 2/7] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 3/7] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-07 6:18 ` Kamezawa Hiroyuki
2013-01-07 6:18 ` Kamezawa Hiroyuki
2013-02-08 19:33 ` Johannes Weiner
2013-02-08 19:33 ` Johannes Weiner
2013-02-11 15:16 ` Michal Hocko
2013-02-11 15:16 ` Michal Hocko
2013-02-11 17:56 ` Johannes Weiner [this message]
2013-02-11 17:56 ` Johannes Weiner
2013-02-11 19:29 ` Michal Hocko
2013-02-11 19:29 ` Michal Hocko
2013-02-11 19:58 ` Johannes Weiner
2013-02-11 19:58 ` Johannes Weiner
2013-02-11 21:27 ` Michal Hocko
2013-02-11 21:27 ` Michal Hocko
2013-02-11 22:07 ` Michal Hocko
2013-02-11 22:07 ` Michal Hocko
2013-02-11 22:39 ` Johannes Weiner
2013-02-11 22:39 ` Johannes Weiner
2013-02-12 9:54 ` Michal Hocko
2013-02-12 9:54 ` Michal Hocko
2013-02-12 15:10 ` Johannes Weiner
2013-02-12 15:10 ` Johannes Weiner
2013-02-12 15:43 ` Michal Hocko
2013-02-12 15:43 ` Michal Hocko
2013-02-12 16:10 ` Paul E. McKenney
2013-02-12 16:10 ` Paul E. McKenney
2013-02-12 17:25 ` Johannes Weiner
2013-02-12 17:25 ` Johannes Weiner
2013-02-12 18:31 ` Paul E. McKenney
2013-02-12 18:31 ` Paul E. McKenney
2013-02-12 19:53 ` Johannes Weiner
2013-02-12 19:53 ` Johannes Weiner
2013-02-13 9:51 ` Michal Hocko
2013-02-13 9:51 ` Michal Hocko
2013-02-12 17:56 ` Michal Hocko
2013-02-12 17:56 ` Michal Hocko
2013-02-12 16:13 ` Michal Hocko
2013-02-12 16:13 ` Michal Hocko
2013-02-12 16:24 ` Michal Hocko
2013-02-12 16:24 ` Michal Hocko
2013-02-12 16:37 ` Michal Hocko
2013-02-12 16:37 ` Michal Hocko
2013-02-12 16:41 ` Johannes Weiner
2013-02-12 16:41 ` Johannes Weiner
2013-02-12 17:12 ` Michal Hocko
2013-02-12 17:12 ` Michal Hocko
2013-02-12 17:37 ` Johannes Weiner
2013-02-12 17:37 ` Johannes Weiner
2013-02-13 8:11 ` Glauber Costa
2013-02-13 8:11 ` Glauber Costa
2013-02-13 10:38 ` Michal Hocko
2013-02-13 10:38 ` Michal Hocko
2013-02-13 10:34 ` Michal Hocko
2013-02-13 10:34 ` Michal Hocko
2013-02-13 12:56 ` Michal Hocko
2013-02-13 12:56 ` Michal Hocko
2013-02-12 16:33 ` Johannes Weiner
2013-02-12 16:33 ` Johannes Weiner
2013-01-03 17:54 ` [PATCH v3 5/7] memcg: simplify mem_cgroup_iter Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 6/7] memcg: further " Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 7/7] cgroup: remove css_get_next Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-04 3:42 ` Li Zefan
2013-01-04 3:42 ` Li Zefan
2013-01-23 12:52 ` [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-23 12:52 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130211175619.GC13218@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=glommer@parallels.com \
--cc=htejun@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=mhocko@suse.cz \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.