From: Vladimir Davydov <vdavydov@parallels.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
Greg Thelen <gthelen@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/8] memcg: get rid of mm_struct::owner
Date: Fri, 10 Jul 2015 10:54:00 +0300 [thread overview]
Message-ID: <20150710075400.GN2436@esperanza> (raw)
In-Reply-To: <20150709140941.GG13872@dhcp22.suse.cz>
On Thu, Jul 09, 2015 at 04:09:41PM +0200, Michal Hocko wrote:
> On Wed 08-07-15 20:32:51, Vladimir Davydov wrote:
> > On Wed, Jul 08, 2015 at 02:27:51PM +0200, Michal Hocko wrote:
[...]
> > > @@ -474,7 +519,7 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
> > > return;
> > >
> > > rcu_read_lock();
> > > - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> > > + memcg = rcu_dereference(mm->memcg);
> > > if (unlikely(!memcg))
> > > goto out;
> > >
> >
> > If I'm not mistaken, mm->memcg equals NULL for any task in the root
> > memory cgroup
>
> right
>
> > (BTW, it it's true, it's worth mentioning in the comment
> > to mm->memcg definition IMO). As a result, we won't account the stats
> > for such tasks, will we?
>
> well spotted! This is certainly a bug. There are more places which are
> checking for mm->memcg being NULL and falling back to root_mem_cgroup. I
> think it would be better to simply use root_mem_cgroup right away. We
> can setup init_mm.memcg = root_mem_cgroup during initialization and be
> done with it. What do you think? The diff is in the very end of the
> email (completely untested yet).
I'd prefer initializing init_mm.memcg to root_mem_cgroup. This way we
wouldn't have to check whether mm->memcg is NULL or not here and there,
which would make the code cleaner IMO.
[...]
> > > @@ -4932,14 +4943,26 @@ static void mem_cgroup_move_task(struct cgroup_subsys_state *css,
> > > {
> > > struct task_struct *p = cgroup_taskset_first(tset);
> > > struct mm_struct *mm = get_task_mm(p);
> > > + struct mem_cgroup *old_memcg = NULL;
> > >
> > > if (mm) {
> > > + old_memcg = READ_ONCE(mm->memcg);
> > > + __mm_set_memcg(mm, mem_cgroup_from_css(css));
> > > +
> > > if (mc.to)
> > > mem_cgroup_move_charge(mm);
> > > mmput(mm);
> > > }
> > > if (mc.to)
> > > mem_cgroup_clear_mc();
> > > +
> > > + /*
> > > + * Be careful and drop the reference only after we are done because
> > > + * p's task_css memcg might be different from p->memcg and nothing else
> > > + * might be pinning the old memcg.
> > > + */
> > > + if (old_memcg)
> > > + css_put(&old_memcg->css);
> >
> > Please explain why the following race is impossible:
> >
> > CPU0 CPU1
> > ---- ----
> > [current = T]
> > dup_mm or exec_mmap
> > mm_inherit_memcg
> > memcg = current->mm->memcg;
> > mem_cgroup_move_task
> > p = T;
> > mm = get_task_mm(p);
> > old_memcg = mm->memcg;
> > css_put(&old_memcg->css);
> > /* old_memcg can be freed now */
> > css_get(memcg); /* BUG */
>
> I guess you are right. The window seem to be very small but CPU0 simly
> might get preempted by the moving task and so even cgroup pinning
> wouldn't help here.
>
> I guess we need
> ---
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index b3e7e30b5a74..6fbd33273b6d 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -300,9 +300,17 @@ void __mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> static inline
> void mm_inherit_memcg(struct mm_struct *newmm, struct mm_struct *oldmm)
> {
> - struct mem_cgroup *memcg = oldmm->memcg;
> + struct mem_cgroup *memcg;
>
> + /*
> + * oldmm might be under move and just replacing its memcg (see
> + * mem_cgroup_move_task) so we have to protect from its memcg
> + * going away between we dereference and take a reference.
> + */
> + rcu_read_lock();
> + memcg = rcu_dereference(oldmm->memcg);
> __mm_set_memcg(newmm, memcg);
If it's safe to call css_get under rcu_read_lock, then it's OK,
otherwise we probably need to use a do {} while (!css_tryget(memcg))
loop in __mm_set_memcg.
> + rcu_read_unlock();
> }
>
> /**
>
>
> Make sure that all tasks have non NULL memcg.
[...]
That looks better to me.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
Greg Thelen <gthelen@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
<linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/8] memcg: get rid of mm_struct::owner
Date: Fri, 10 Jul 2015 10:54:00 +0300 [thread overview]
Message-ID: <20150710075400.GN2436@esperanza> (raw)
In-Reply-To: <20150709140941.GG13872@dhcp22.suse.cz>
On Thu, Jul 09, 2015 at 04:09:41PM +0200, Michal Hocko wrote:
> On Wed 08-07-15 20:32:51, Vladimir Davydov wrote:
> > On Wed, Jul 08, 2015 at 02:27:51PM +0200, Michal Hocko wrote:
[...]
> > > @@ -474,7 +519,7 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
> > > return;
> > >
> > > rcu_read_lock();
> > > - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> > > + memcg = rcu_dereference(mm->memcg);
> > > if (unlikely(!memcg))
> > > goto out;
> > >
> >
> > If I'm not mistaken, mm->memcg equals NULL for any task in the root
> > memory cgroup
>
> right
>
> > (BTW, it it's true, it's worth mentioning in the comment
> > to mm->memcg definition IMO). As a result, we won't account the stats
> > for such tasks, will we?
>
> well spotted! This is certainly a bug. There are more places which are
> checking for mm->memcg being NULL and falling back to root_mem_cgroup. I
> think it would be better to simply use root_mem_cgroup right away. We
> can setup init_mm.memcg = root_mem_cgroup during initialization and be
> done with it. What do you think? The diff is in the very end of the
> email (completely untested yet).
I'd prefer initializing init_mm.memcg to root_mem_cgroup. This way we
wouldn't have to check whether mm->memcg is NULL or not here and there,
which would make the code cleaner IMO.
[...]
> > > @@ -4932,14 +4943,26 @@ static void mem_cgroup_move_task(struct cgroup_subsys_state *css,
> > > {
> > > struct task_struct *p = cgroup_taskset_first(tset);
> > > struct mm_struct *mm = get_task_mm(p);
> > > + struct mem_cgroup *old_memcg = NULL;
> > >
> > > if (mm) {
> > > + old_memcg = READ_ONCE(mm->memcg);
> > > + __mm_set_memcg(mm, mem_cgroup_from_css(css));
> > > +
> > > if (mc.to)
> > > mem_cgroup_move_charge(mm);
> > > mmput(mm);
> > > }
> > > if (mc.to)
> > > mem_cgroup_clear_mc();
> > > +
> > > + /*
> > > + * Be careful and drop the reference only after we are done because
> > > + * p's task_css memcg might be different from p->memcg and nothing else
> > > + * might be pinning the old memcg.
> > > + */
> > > + if (old_memcg)
> > > + css_put(&old_memcg->css);
> >
> > Please explain why the following race is impossible:
> >
> > CPU0 CPU1
> > ---- ----
> > [current = T]
> > dup_mm or exec_mmap
> > mm_inherit_memcg
> > memcg = current->mm->memcg;
> > mem_cgroup_move_task
> > p = T;
> > mm = get_task_mm(p);
> > old_memcg = mm->memcg;
> > css_put(&old_memcg->css);
> > /* old_memcg can be freed now */
> > css_get(memcg); /* BUG */
>
> I guess you are right. The window seem to be very small but CPU0 simly
> might get preempted by the moving task and so even cgroup pinning
> wouldn't help here.
>
> I guess we need
> ---
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index b3e7e30b5a74..6fbd33273b6d 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -300,9 +300,17 @@ void __mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> static inline
> void mm_inherit_memcg(struct mm_struct *newmm, struct mm_struct *oldmm)
> {
> - struct mem_cgroup *memcg = oldmm->memcg;
> + struct mem_cgroup *memcg;
>
> + /*
> + * oldmm might be under move and just replacing its memcg (see
> + * mem_cgroup_move_task) so we have to protect from its memcg
> + * going away between we dereference and take a reference.
> + */
> + rcu_read_lock();
> + memcg = rcu_dereference(oldmm->memcg);
> __mm_set_memcg(newmm, memcg);
If it's safe to call css_get under rcu_read_lock, then it's OK,
otherwise we probably need to use a do {} while (!css_tryget(memcg))
loop in __mm_set_memcg.
> + rcu_read_unlock();
> }
>
> /**
>
>
> Make sure that all tasks have non NULL memcg.
[...]
That looks better to me.
Thanks,
Vladimir
next prev parent reply other threads:[~2015-07-10 7:54 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-08 12:27 [PATCH 0/8 -v3] memcg cleanups + get rid of mm_struct::owner Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 12:27 ` [PATCH 1/8] memcg: export struct mem_cgroup Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 15:39 ` Vladimir Davydov
2015-07-08 15:39 ` Vladimir Davydov
2015-07-09 11:22 ` Michal Hocko
2015-07-09 11:22 ` Michal Hocko
2015-07-09 11:51 ` Vladimir Davydov
2015-07-09 11:51 ` Vladimir Davydov
2015-07-08 12:27 ` [PATCH 2/8] memcg: get rid of mem_cgroup_root_css for !CONFIG_MEMCG Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 15:41 ` Vladimir Davydov
2015-07-08 15:41 ` Vladimir Davydov
2015-07-08 12:27 ` [PATCH 3/8] memcg: get rid of extern for functions in memcontrol.h Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 15:43 ` Vladimir Davydov
2015-07-08 15:43 ` Vladimir Davydov
2015-07-08 12:27 ` [PATCH 4/8] memcg, mm: move mem_cgroup_select_victim_node into vmscan Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 16:01 ` Vladimir Davydov
2015-07-08 16:01 ` Vladimir Davydov
2015-07-09 12:08 ` Michal Hocko
2015-07-09 12:08 ` Michal Hocko
2015-07-08 12:27 ` [PATCH 5/8] memcg: restructure mem_cgroup_can_attach() Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 16:05 ` Vladimir Davydov
2015-07-08 16:05 ` Vladimir Davydov
2015-07-08 12:27 ` [PATCH 6/8] memcg, tcp_kmem: check for cg_proto in sock_update_memcg Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 16:11 ` Vladimir Davydov
2015-07-08 16:11 ` Vladimir Davydov
2015-07-08 12:27 ` [PATCH 7/8] memcg: get rid of mm_struct::owner Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 17:32 ` Vladimir Davydov
2015-07-08 17:32 ` Vladimir Davydov
2015-07-09 14:09 ` Michal Hocko
2015-07-09 14:09 ` Michal Hocko
2015-07-10 7:54 ` Vladimir Davydov [this message]
2015-07-10 7:54 ` Vladimir Davydov
2015-07-10 12:45 ` Michal Hocko
2015-07-10 12:45 ` Michal Hocko
2015-07-11 7:09 ` Vladimir Davydov
2015-07-11 7:09 ` Vladimir Davydov
2015-07-14 15:32 ` Michal Hocko
2015-07-14 15:32 ` Michal Hocko
2015-07-10 14:05 ` Michal Hocko
2015-07-10 14:05 ` Michal Hocko
2015-07-14 15:18 ` Michal Hocko
2015-07-14 15:18 ` Michal Hocko
2015-07-29 11:58 ` Michal Hocko
2015-07-29 11:58 ` Michal Hocko
2015-07-29 13:14 ` Johannes Weiner
2015-07-29 13:14 ` Johannes Weiner
2015-07-29 15:05 ` Michal Hocko
2015-07-29 15:05 ` Michal Hocko
2015-07-29 16:42 ` Johannes Weiner
2015-07-29 16:42 ` Johannes Weiner
2015-07-08 12:27 ` [PATCH 8/8] memcg: get rid of mem_cgroup_from_task Michal Hocko
2015-07-08 12:27 ` Michal Hocko
2015-07-08 17:43 ` Vladimir Davydov
2015-07-08 17:43 ` Vladimir Davydov
2015-07-09 14:13 ` Michal Hocko
2015-07-09 14:13 ` Michal Hocko
2015-07-09 14:32 ` Vladimir Davydov
2015-07-09 14:32 ` Vladimir Davydov
2015-07-09 16:33 ` Michal Hocko
2015-07-09 16:33 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150710075400.GN2436@esperanza \
--to=vdavydov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.