From: ebiederm@xmission.com (Eric W. Biederman)
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
peterz@infradead.org, viro@zeniv.linux.org.uk, mingo@kernel.org,
paulmck@linux.vnet.ibm.com, keescook@chromium.org,
riel@redhat.com, tglx@linutronix.de,
kirill.shutemov@linux.intel.com, marcos.souza.org@gmail.com,
hoeun.ryu@gmail.com, pasha.tatashin@oracle.com,
gs051095@gmail.com, dhowells@redhat.com, rppt@linux.vnet.ibm.com,
linux-kernel@vger.kernel.org,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Tejun Heo <tj@kernel.org>, Oleg Nesterov <oleg@redhat.com>
Subject: [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup
Date: Fri, 01 Jun 2018 09:53:09 -0500 [thread overview]
Message-ID: <87sh66wobu.fsf_-_@xmission.com> (raw)
In-Reply-To: <87y3fywodn.fsf_-_@xmission.com> (Eric W. Biederman's message of "Fri, 01 Jun 2018 09:52:04 -0500")
>From a userspace perspective the cgroup of a mm is determined by which
the cgroup the task belongs too. As practically an mm can only belong to
a single memory cgroup having multiple tasks with the same mm in different
memory cgroups is not well defined.
Avoid the difficulties of dealing with crazy semantics and restrict all
tasks that share a single mm to the same memory cgroup.
This is accomplished by adding a new function mem_cgroup_mm_can_attach
that checks this condition with a straight forward algorithm. In the worst
case it is O(N^2). In the common case it should be O(N) in the number of
tasks being migrated. As typically only a single process and thus a single
process is being migrated and it is optimized for that case.
There are users of mmget such as proc that can create references to an
mm this function can not find. This results in an unnecessary
migration failure. It does not break the invariant that every task of
an mm stays in the same memory cgroup. So this condition is annoying
but harmelss.
This requires multi-threaded mm's to be migrated using the procs file.
This effectively forbids process with mm's shared processes being migrated.
Although enabling the control file might work.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
mm/memcontrol.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 21c13f4768eb..078ef562bb90 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4798,6 +4798,50 @@ static void mem_cgroup_clear_mc(void)
mmput(mm);
}
+static int mem_cgroup_mm_can_attach(struct cgroup_taskset *tset)
+{
+ struct cgroup_subsys_state *css, *tcss;
+ struct task_struct *p, *t;
+ struct mm_struct *mm = NULL;
+ int ret = -EINVAL;
+
+ /*
+ * Ensure all references for every mm can be accounted for by
+ * tasks that are being migrated.
+ */
+ rcu_read_lock();
+ cgroup_taskset_for_each(p, css, tset) {
+ int mm_users, mm_count;
+
+ /* Does this task have an mm that has not been visited? */
+ if (!p->mm || (p->flags & PF_KTHREAD) || (p->mm == mm))
+ continue;
+
+ mm = p->mm;
+ mm_users = atomic_read(&mm->mm_users);
+ if (mm_users == 1)
+ continue;
+
+ mm_count = 0;
+ cgroup_taskset_for_each(t, tcss, tset) {
+ if (t->mm != mm)
+ continue;
+ mm_count++;
+ }
+ /*
+ * If there are no non-task references mm_users will
+ * be stable as holding cgroup_thread_rwsem for write
+ * blocks fork and exit.
+ */
+ if (mm_count != mm_users)
+ goto out;
+ }
+ ret = 0;
+out:
+ rcu_read_unlock();
+ return ret;
+}
+
static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
@@ -4806,7 +4850,12 @@ static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
struct task_struct *leader, *p;
struct mm_struct *mm;
unsigned long move_flags;
- int ret = 0;
+ int ret;
+
+ /* Is every task of every mm in tset being moved? */
+ ret = mem_cgroup_mm_can_attach(tset);
+ if (ret)
+ return ret;
/* charge immigration isn't supported on the default hierarchy */
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
--
2.14.1
next prev parent reply other threads:[~2018-06-01 14:53 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-26 11:00 [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Kirill Tkhai
2018-04-26 11:00 ` [PATCH 1/4] exit: Move read_unlock() up in mm_update_next_owner() Kirill Tkhai
2018-04-26 15:01 ` Oleg Nesterov
2018-04-26 11:00 ` [PATCH 2/4] exit: Use rcu instead of get_task_struct() " Kirill Tkhai
2018-04-26 11:00 ` [PATCH 3/4] exit: Rename assign_new_owner label " Kirill Tkhai
2018-04-26 11:01 ` [PATCH 4/4] exit: Lockless iteration over task list " Kirill Tkhai
2018-04-26 12:35 ` Andrea Parri
2018-04-26 13:52 ` Kirill Tkhai
2018-04-26 15:20 ` Peter Zijlstra
2018-04-26 15:56 ` Kirill Tkhai
2018-04-26 15:20 ` Peter Zijlstra
2018-04-26 16:04 ` Kirill Tkhai
2018-04-26 15:29 ` Andrea Parri
2018-04-26 16:11 ` Kirill Tkhai
2018-04-26 13:07 ` [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Michal Hocko
2018-04-26 13:52 ` Oleg Nesterov
2018-04-26 14:07 ` Kirill Tkhai
2018-04-26 15:10 ` Oleg Nesterov
2018-04-26 16:19 ` Eric W. Biederman
2018-04-26 19:28 ` Michal Hocko
2018-04-27 7:08 ` Michal Hocko
2018-04-27 18:05 ` Eric W. Biederman
2018-05-01 17:22 ` Eric W. Biederman
2018-05-01 17:35 ` [RFC][PATCH] memcg: Replace mm->owner with mm->memcg Eric W. Biederman
2018-05-02 8:47 ` Michal Hocko
2018-05-02 13:20 ` Johannes Weiner
2018-05-02 14:05 ` Eric W. Biederman
2018-05-02 19:21 ` [PATCH] " Eric W. Biederman
2018-05-02 21:04 ` Andrew Morton
2018-05-02 21:35 ` Eric W. Biederman
2018-05-03 13:33 ` Oleg Nesterov
2018-05-03 14:39 ` Eric W. Biederman
2018-05-04 14:20 ` Oleg Nesterov
2018-05-04 14:36 ` Eric W. Biederman
2018-05-04 14:54 ` Oleg Nesterov
2018-05-04 15:49 ` Eric W. Biederman
2018-05-04 16:22 ` Oleg Nesterov
2018-05-04 16:40 ` Eric W. Biederman
2018-05-04 17:26 ` [PATCH 0/2] mm->owner to mm->memcg fixes Eric W. Biederman
2018-05-04 17:26 ` [PATCH 1/2] memcg: Update the mm->memcg maintenance to work when !CONFIG_MMU Eric W. Biederman
2018-05-04 17:27 ` [PATCH 2/2] memcg: Close the race between migration and installing bprm->mm as mm Eric W. Biederman
2018-05-09 14:51 ` Oleg Nesterov
2018-05-10 3:00 ` Eric W. Biederman
2018-05-10 12:14 ` [PATCH 0/2] mm->owner to mm->memcg fixes Michal Hocko
2018-05-10 12:18 ` Michal Hocko
2018-05-22 12:57 ` Michal Hocko
2018-05-23 19:46 ` Eric W. Biederman
2018-05-24 11:10 ` Michal Hocko
2018-05-24 21:16 ` Andrew Morton
2018-05-24 23:37 ` Andrea Parri
2018-05-30 12:17 ` Michal Hocko
2018-05-31 18:41 ` Eric W. Biederman
2018-06-01 1:57 ` [PATCH] memcg: Replace mm->owner with mm->memcg Eric W. Biederman
2018-06-01 14:52 ` [RFC][PATCH 0/2] memcg: Require every task that uses an mm to migrate together Eric W. Biederman
2018-06-01 14:53 ` Eric W. Biederman [this message]
2018-06-01 16:50 ` [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup Tejun Heo
2018-06-01 18:11 ` Eric W. Biederman
2018-06-01 19:16 ` Tejun Heo
2018-06-04 13:01 ` Michal Hocko
2018-06-04 18:47 ` Tejun Heo
2018-06-04 19:11 ` Eric W. Biederman
2018-06-06 11:13 ` Michal Hocko
2018-06-07 11:42 ` Eric W. Biederman
2018-06-07 12:19 ` Michal Hocko
2018-06-01 14:53 ` [RFC][PATCH 2/2] memcgl: Remove dead code now that all tasks of an mm share a memcg Eric W. Biederman
2018-06-01 14:07 ` [PATCH 0/2] mm->owner to mm->memcg fixes Michal Hocko
2018-05-24 21:17 ` Andrew Morton
2018-05-30 11:52 ` Michal Hocko
2018-05-31 17:43 ` Eric W. Biederman
2018-05-07 14:33 ` [PATCH] memcg: Replace mm->owner with mm->memcg Oleg Nesterov
2018-05-08 3:15 ` Eric W. Biederman
2018-05-09 14:40 ` Oleg Nesterov
2018-05-10 3:09 ` Eric W. Biederman
2018-05-10 4:03 ` [RFC][PATCH] cgroup: Don't mess with tasks in exec Eric W. Biederman
2018-05-10 12:15 ` Oleg Nesterov
2018-05-10 12:35 ` Tejun Heo
2018-05-10 12:38 ` [PATCH] memcg: Replace mm->owner with mm->memcg Oleg Nesterov
2018-05-04 11:07 ` Michal Hocko
2018-05-05 16:54 ` kbuild test robot
2018-05-07 23:18 ` Andrew Morton
2018-05-08 2:17 ` Eric W. Biederman
2018-05-09 21:00 ` Michal Hocko
2018-05-02 23:59 ` [RFC][PATCH] " Balbir Singh
2018-05-03 15:11 ` Eric W. Biederman
2018-05-04 4:59 ` Balbir Singh
2018-05-03 10:52 ` [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Kirill Tkhai
2018-06-01 1:07 ` Eric W. Biederman
2018-06-01 13:57 ` Michal Hocko
2018-06-01 14:32 ` Eric W. Biederman
2018-06-01 15:02 ` Michal Hocko
2018-06-01 15:25 ` Eric W. Biederman
2018-06-04 6:54 ` Michal Hocko
2018-06-04 14:31 ` Eric W. Biederman
2018-06-05 8:15 ` Michal Hocko
2018-06-05 8:48 ` Kirill Tkhai
2018-06-05 15:36 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sh66wobu.fsf_-_@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=dhowells@redhat.com \
--cc=gs051095@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hoeun.ryu@gmail.com \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcos.souza.org@gmail.com \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=pasha.tatashin@oracle.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).