From: ebiederm@xmission.com (Eric W. Biederman)
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
peterz@infradead.org, viro@zeniv.linux.org.uk, mingo@kernel.org,
paulmck@linux.vnet.ibm.com, keescook@chromium.org,
riel@redhat.com, tglx@linutronix.de,
kirill.shutemov@linux.intel.com, marcos.souza.org@gmail.com,
hoeun.ryu@gmail.com, pasha.tatashin@oracle.com,
gs051095@gmail.com, dhowells@redhat.com, rppt@linux.vnet.ibm.com,
linux-kernel@vger.kernel.org,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Tejun Heo <tj@kernel.org>, Oleg Nesterov <oleg@redhat.com>
Subject: Re: [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup
Date: Thu, 07 Jun 2018 06:42:49 -0500 [thread overview]
Message-ID: <87wova6cw6.fsf@xmission.com> (raw)
In-Reply-To: <20180606111333.GF32433@dhcp22.suse.cz> (Michal Hocko's message of "Wed, 6 Jun 2018 13:13:33 +0200")
Michal Hocko <mhocko@kernel.org> writes:
> On Fri 01-06-18 09:53:09, Eric W. Biederman wrote:
> [...]
>> +static int mem_cgroup_mm_can_attach(struct cgroup_taskset *tset)
>> +{
>> + struct cgroup_subsys_state *css, *tcss;
>> + struct task_struct *p, *t;
>> + struct mm_struct *mm = NULL;
>> + int ret = -EINVAL;
>> +
>> + /*
>> + * Ensure all references for every mm can be accounted for by
>> + * tasks that are being migrated.
>> + */
>> + rcu_read_lock();
>> + cgroup_taskset_for_each(p, css, tset) {
>> + int mm_users, mm_count;
>> +
>> + /* Does this task have an mm that has not been visited? */
>> + if (!p->mm || (p->flags & PF_KTHREAD) || (p->mm == mm))
>> + continue;
>> +
>> + mm = p->mm;
>> + mm_users = atomic_read(&mm->mm_users);
>> + if (mm_users == 1)
>> + continue;
>> +
>> + mm_count = 0;
>> + cgroup_taskset_for_each(t, tcss, tset) {
>> + if (t->mm != mm)
>> + continue;
>> + mm_count++;
>> + }
>> + /*
>> + * If there are no non-task references mm_users will
>> + * be stable as holding cgroup_thread_rwsem for write
>> + * blocks fork and exit.
>> + */
>> + if (mm_count != mm_users)
>> + goto out;
>
> Wouldn't it be much more simple to do something like this instead? Sure
> it will break migration of non-thread tasks sharing mms but I would like
> to see that this actually matters to anybody rather than make the code
> more complicated and maintain it forever without a good reason. So yes
> this is a bit harsh but considering that the migration suceeded silently
> and that was simply broken as well, I am not sure this is any worse.
I definitely agree that there are some other possibilities worth
exploring.
> Btw. MMF_ALIEN_MM could be used in the OOM proper as well.
There are two big issues I see with your suggested alternative.
1) cgroupv1 the task interface. We still need to deny migrating
part of a thread group.
2) vfork. That uses CLONE_MM and it gets used. At a quick look
I am seeing gcc, g++, cpp, emacs24, strace, calendar, nm, telnet, gdb,
and several other programs I don't recognize.
I believe your proposal will prevent onlining the memcgroup if there
is a compile running, because of failure to support vfork.
Further I expect we would need a count rather than a bit that gets set
and never gets cleared. Or else even when the mm is no longer shared by
vfork we will still think it is.
Michal do you have an opinion on my previous patch?
I just want to make certain that this fun work does not get all of the
attention, and the bug fix actually gets reviewed.
Eric
> diff --git a/kernel/fork.c b/kernel/fork.c
> index dfe8e14c0fba..285cd0397307 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1303,6 +1303,8 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
> if (clone_flags & CLONE_VM) {
> mmget(oldmm);
> mm = oldmm;
> + if (unlikely(!(clone_flags & CLONE_THREAD)))
> + set_bit(MMF_ALIEN_MM, &mm->flags);
> goto good_mm;
> }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2f5dac193431..fa0248fcdb36 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5103,6 +5103,18 @@ static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
> if (!mm)
> return 0;
>
> + /*
> + * Migrating a task which shares mm with other thread group
> + * has never been really well defined. Shout to the log when
> + * this happens and see whether anybody really complains.
> + * If so make sure to support migration if all processes sharing
> + * this mm are migrating together.
> + */
> + if (WARN_ON_ONCE(test_bit(MMF_ALIEN_MM, &mm->flags))) {
> + mmput(mm);
> + return -EBUSY;
> + }
> +
> /* We move charges except for creative uses of CLONE_VM */
> if (mm->memcg == from) {
> VM_BUG_ON(mc.from);
next prev parent reply other threads:[~2018-06-07 11:43 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-26 11:00 [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Kirill Tkhai
2018-04-26 11:00 ` [PATCH 1/4] exit: Move read_unlock() up in mm_update_next_owner() Kirill Tkhai
2018-04-26 15:01 ` Oleg Nesterov
2018-04-26 11:00 ` [PATCH 2/4] exit: Use rcu instead of get_task_struct() " Kirill Tkhai
2018-04-26 11:00 ` [PATCH 3/4] exit: Rename assign_new_owner label " Kirill Tkhai
2018-04-26 11:01 ` [PATCH 4/4] exit: Lockless iteration over task list " Kirill Tkhai
2018-04-26 12:35 ` Andrea Parri
2018-04-26 13:52 ` Kirill Tkhai
2018-04-26 15:20 ` Peter Zijlstra
2018-04-26 15:56 ` Kirill Tkhai
2018-04-26 15:20 ` Peter Zijlstra
2018-04-26 16:04 ` Kirill Tkhai
2018-04-26 15:29 ` Andrea Parri
2018-04-26 16:11 ` Kirill Tkhai
2018-04-26 13:07 ` [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Michal Hocko
2018-04-26 13:52 ` Oleg Nesterov
2018-04-26 14:07 ` Kirill Tkhai
2018-04-26 15:10 ` Oleg Nesterov
2018-04-26 16:19 ` Eric W. Biederman
2018-04-26 19:28 ` Michal Hocko
2018-04-27 7:08 ` Michal Hocko
2018-04-27 18:05 ` Eric W. Biederman
2018-05-01 17:22 ` Eric W. Biederman
2018-05-01 17:35 ` [RFC][PATCH] memcg: Replace mm->owner with mm->memcg Eric W. Biederman
2018-05-02 8:47 ` Michal Hocko
2018-05-02 13:20 ` Johannes Weiner
2018-05-02 14:05 ` Eric W. Biederman
2018-05-02 19:21 ` [PATCH] " Eric W. Biederman
2018-05-02 21:04 ` Andrew Morton
2018-05-02 21:35 ` Eric W. Biederman
2018-05-03 13:33 ` Oleg Nesterov
2018-05-03 14:39 ` Eric W. Biederman
2018-05-04 14:20 ` Oleg Nesterov
2018-05-04 14:36 ` Eric W. Biederman
2018-05-04 14:54 ` Oleg Nesterov
2018-05-04 15:49 ` Eric W. Biederman
2018-05-04 16:22 ` Oleg Nesterov
2018-05-04 16:40 ` Eric W. Biederman
2018-05-04 17:26 ` [PATCH 0/2] mm->owner to mm->memcg fixes Eric W. Biederman
2018-05-04 17:26 ` [PATCH 1/2] memcg: Update the mm->memcg maintenance to work when !CONFIG_MMU Eric W. Biederman
2018-05-04 17:27 ` [PATCH 2/2] memcg: Close the race between migration and installing bprm->mm as mm Eric W. Biederman
2018-05-09 14:51 ` Oleg Nesterov
2018-05-10 3:00 ` Eric W. Biederman
2018-05-10 12:14 ` [PATCH 0/2] mm->owner to mm->memcg fixes Michal Hocko
2018-05-10 12:18 ` Michal Hocko
2018-05-22 12:57 ` Michal Hocko
2018-05-23 19:46 ` Eric W. Biederman
2018-05-24 11:10 ` Michal Hocko
2018-05-24 21:16 ` Andrew Morton
2018-05-24 23:37 ` Andrea Parri
2018-05-30 12:17 ` Michal Hocko
2018-05-31 18:41 ` Eric W. Biederman
2018-06-01 1:57 ` [PATCH] memcg: Replace mm->owner with mm->memcg Eric W. Biederman
2018-06-01 14:52 ` [RFC][PATCH 0/2] memcg: Require every task that uses an mm to migrate together Eric W. Biederman
2018-06-01 14:53 ` [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup Eric W. Biederman
2018-06-01 16:50 ` Tejun Heo
2018-06-01 18:11 ` Eric W. Biederman
2018-06-01 19:16 ` Tejun Heo
2018-06-04 13:01 ` Michal Hocko
2018-06-04 18:47 ` Tejun Heo
2018-06-04 19:11 ` Eric W. Biederman
2018-06-06 11:13 ` Michal Hocko
2018-06-07 11:42 ` Eric W. Biederman [this message]
2018-06-07 12:19 ` Michal Hocko
2018-06-01 14:53 ` [RFC][PATCH 2/2] memcgl: Remove dead code now that all tasks of an mm share a memcg Eric W. Biederman
2018-06-01 14:07 ` [PATCH 0/2] mm->owner to mm->memcg fixes Michal Hocko
2018-05-24 21:17 ` Andrew Morton
2018-05-30 11:52 ` Michal Hocko
2018-05-31 17:43 ` Eric W. Biederman
2018-05-07 14:33 ` [PATCH] memcg: Replace mm->owner with mm->memcg Oleg Nesterov
2018-05-08 3:15 ` Eric W. Biederman
2018-05-09 14:40 ` Oleg Nesterov
2018-05-10 3:09 ` Eric W. Biederman
2018-05-10 4:03 ` [RFC][PATCH] cgroup: Don't mess with tasks in exec Eric W. Biederman
2018-05-10 12:15 ` Oleg Nesterov
2018-05-10 12:35 ` Tejun Heo
2018-05-10 12:38 ` [PATCH] memcg: Replace mm->owner with mm->memcg Oleg Nesterov
2018-05-04 11:07 ` Michal Hocko
2018-05-05 16:54 ` kbuild test robot
2018-05-07 23:18 ` Andrew Morton
2018-05-08 2:17 ` Eric W. Biederman
2018-05-09 21:00 ` Michal Hocko
2018-05-02 23:59 ` [RFC][PATCH] " Balbir Singh
2018-05-03 15:11 ` Eric W. Biederman
2018-05-04 4:59 ` Balbir Singh
2018-05-03 10:52 ` [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Kirill Tkhai
2018-06-01 1:07 ` Eric W. Biederman
2018-06-01 13:57 ` Michal Hocko
2018-06-01 14:32 ` Eric W. Biederman
2018-06-01 15:02 ` Michal Hocko
2018-06-01 15:25 ` Eric W. Biederman
2018-06-04 6:54 ` Michal Hocko
2018-06-04 14:31 ` Eric W. Biederman
2018-06-05 8:15 ` Michal Hocko
2018-06-05 8:48 ` Kirill Tkhai
2018-06-05 15:36 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wova6cw6.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=dhowells@redhat.com \
--cc=gs051095@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hoeun.ryu@gmail.com \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcos.souza.org@gmail.com \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=pasha.tatashin@oracle.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.