From: ebiederm@xmission.com (Eric W. Biederman)
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
peterz@infradead.org, viro@zeniv.linux.org.uk, mingo@kernel.org,
paulmck@linux.vnet.ibm.com, keescook@chromium.org,
riel@redhat.com, tglx@linutronix.de,
kirill.shutemov@linux.intel.com, marcos.souza.org@gmail.com,
hoeun.ryu@gmail.com, pasha.tatashin@oracle.com,
gs051095@gmail.com, dhowells@redhat.com, rppt@linux.vnet.ibm.com,
linux-kernel@vger.kernel.org,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Tejun Heo <tj@kernel.org>, Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH 0/2] mm->owner to mm->memcg fixes
Date: Wed, 23 May 2018 14:46:43 -0500 [thread overview]
Message-ID: <87wovu889o.fsf@xmission.com> (raw)
In-Reply-To: <20180522125757.GL20020@dhcp22.suse.cz> (Michal Hocko's message of "Tue, 22 May 2018 14:57:57 +0200")
Michal Hocko <mhocko@kernel.org> writes:
> On Thu 10-05-18 14:14:18, Michal Hocko wrote:
>> On Fri 04-05-18 12:26:03, Eric W. Biederman wrote:
>> >
>> > Andrew can you pick up these two fixes as well.
>> >
>> > These address the issues Michal Hocko and Oleg Nesterov noticed.
>>
>> I completely got lost in this thread. There are way too many things
>> discussed at once. Could you repost the full series for a proper review
>> please?
>
> ping
Quick summary of where things stand. (Just getting back from vacation
and catching up with things).
Andrew has picked up the best version of these patches and you can look
at his tree to find the current patches.
Looking at my tree the issues that have been looked at above
the basic patch are:
!CONFIG_MMU support (code motion)
Races during exec. (Roughly solved but I think we can do better by
expanding the scope of
cgroup_threadgroup_change_begin/end in exec and
just making exec atomic wrt to cgroup changes)
While I was on vacation you posted an old concern about a condtion
where get_mem_cgroup_from_mm was not guaranteed to complete, and how
that interacts with charge migration.
Looking at your description the concern is that cgroup_rmdir can succeed
when a cgroup has just an mm in it (and no tasks). The result is
that mem_cgroup_try_charge can loop indefinitely in that case.
That is possible with two processes sharing the same mm, but living in
different memory cgroups. That is a silly useless configuration but
something to support at least to the level of making certain kernel's
don't wedge when it happens by accident or maliciously.
The suggestion of having cgroup_is_populated take this into account
seems like a good general solution but I don't think that is strictly
necessary.
In the spirit of let's make this work. How about:
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Wed, 23 May 2018 13:48:31 -0500
Subject: [PATCH] memcontrol: Guarantee that get_mem_cgroup_from_mm does not
loop forever
The root memory cgroup should always be online, so when returning it
unconditionally bump it's refcount.
The only way for mm->memcg to point to an offline memory cgroup is if
CLONE_VM was used to create two processes that share a mm and those
processes were put in different memory cgroups. If one of those
processes exited and then cgroup_rmdir was called on it's memory
cgroup.
As two processes sharing an mm is useless and highly unlikely there is
no need to handle this case well, it just needs to be handled well
enough to prevent an indefinite loop. So when css_tryget_online fails
just treat the mm as belong to the root memory cgroup.
The cgroup migration and the cgroup remove actions both happen
with the the cgroup_mutex held. So there is no need to worry about
a mm that is migrating cgroups not having a live cgroup.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
mm/memcontrol.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d74aeba7dfed..dbb197bfc517 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -666,25 +666,29 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
static struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
{
- struct mem_cgroup *memcg = NULL;
+ struct mem_cgroup *memcg;
rcu_read_lock();
- do {
- /*
- * Page cache insertions can happen withou an
- * actual mm context, e.g. during disk probing
- * on boot, loopback IO, acct() writes etc.
- */
- if (unlikely(!mm))
- memcg = root_mem_cgroup;
- else {
- memcg = rcu_dereference(mm->memcg);
- if (unlikely(!memcg))
- memcg = root_mem_cgroup;
- }
- } while (!css_tryget_online(&memcg->css));
+ /*
+ * Page cache insertions can happen withou an
+ * actual mm context, e.g. during disk probing
+ * on boot, loopback IO, acct() writes etc.
+ */
+ if (unlikely(!mm))
+ goto root_memcg;
+
+ memcg = rcu_dereference(mm->memcg);
+ if (unlikely(!memcg))
+ goto root_memcg;
+ if (!css_tryget_online(&memcg->css))
+ goto root_memcg;
rcu_read_unlock();
return memcg;
+
+root_memcg:
+ css_get(&root_mem_cgroup->css);
+ rcu_read_unlock();
+ return root_mem_cgroup;
}
/**
--
2.14.1
next prev parent reply other threads:[~2018-05-23 19:46 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-26 11:00 [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Kirill Tkhai
2018-04-26 11:00 ` [PATCH 1/4] exit: Move read_unlock() up in mm_update_next_owner() Kirill Tkhai
2018-04-26 15:01 ` Oleg Nesterov
2018-04-26 11:00 ` [PATCH 2/4] exit: Use rcu instead of get_task_struct() " Kirill Tkhai
2018-04-26 11:00 ` [PATCH 3/4] exit: Rename assign_new_owner label " Kirill Tkhai
2018-04-26 11:01 ` [PATCH 4/4] exit: Lockless iteration over task list " Kirill Tkhai
2018-04-26 12:35 ` Andrea Parri
2018-04-26 13:52 ` Kirill Tkhai
2018-04-26 15:20 ` Peter Zijlstra
2018-04-26 15:56 ` Kirill Tkhai
2018-04-26 15:20 ` Peter Zijlstra
2018-04-26 16:04 ` Kirill Tkhai
2018-04-26 15:29 ` Andrea Parri
2018-04-26 16:11 ` Kirill Tkhai
2018-04-26 13:07 ` [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Michal Hocko
2018-04-26 13:52 ` Oleg Nesterov
2018-04-26 14:07 ` Kirill Tkhai
2018-04-26 15:10 ` Oleg Nesterov
2018-04-26 16:19 ` Eric W. Biederman
2018-04-26 19:28 ` Michal Hocko
2018-04-27 7:08 ` Michal Hocko
2018-04-27 18:05 ` Eric W. Biederman
2018-05-01 17:22 ` Eric W. Biederman
2018-05-01 17:35 ` [RFC][PATCH] memcg: Replace mm->owner with mm->memcg Eric W. Biederman
2018-05-02 8:47 ` Michal Hocko
2018-05-02 13:20 ` Johannes Weiner
2018-05-02 14:05 ` Eric W. Biederman
2018-05-02 19:21 ` [PATCH] " Eric W. Biederman
2018-05-02 21:04 ` Andrew Morton
2018-05-02 21:35 ` Eric W. Biederman
2018-05-03 13:33 ` Oleg Nesterov
2018-05-03 14:39 ` Eric W. Biederman
2018-05-04 14:20 ` Oleg Nesterov
2018-05-04 14:36 ` Eric W. Biederman
2018-05-04 14:54 ` Oleg Nesterov
2018-05-04 15:49 ` Eric W. Biederman
2018-05-04 16:22 ` Oleg Nesterov
2018-05-04 16:40 ` Eric W. Biederman
2018-05-04 17:26 ` [PATCH 0/2] mm->owner to mm->memcg fixes Eric W. Biederman
2018-05-04 17:26 ` [PATCH 1/2] memcg: Update the mm->memcg maintenance to work when !CONFIG_MMU Eric W. Biederman
2018-05-04 17:27 ` [PATCH 2/2] memcg: Close the race between migration and installing bprm->mm as mm Eric W. Biederman
2018-05-09 14:51 ` Oleg Nesterov
2018-05-10 3:00 ` Eric W. Biederman
2018-05-10 12:14 ` [PATCH 0/2] mm->owner to mm->memcg fixes Michal Hocko
2018-05-10 12:18 ` Michal Hocko
2018-05-22 12:57 ` Michal Hocko
2018-05-23 19:46 ` Eric W. Biederman [this message]
2018-05-24 11:10 ` Michal Hocko
2018-05-24 21:16 ` Andrew Morton
2018-05-24 23:37 ` Andrea Parri
2018-05-30 12:17 ` Michal Hocko
2018-05-31 18:41 ` Eric W. Biederman
2018-06-01 1:57 ` [PATCH] memcg: Replace mm->owner with mm->memcg Eric W. Biederman
2018-06-01 14:52 ` [RFC][PATCH 0/2] memcg: Require every task that uses an mm to migrate together Eric W. Biederman
2018-06-01 14:53 ` [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup Eric W. Biederman
2018-06-01 16:50 ` Tejun Heo
2018-06-01 18:11 ` Eric W. Biederman
2018-06-01 19:16 ` Tejun Heo
2018-06-04 13:01 ` Michal Hocko
2018-06-04 18:47 ` Tejun Heo
2018-06-04 19:11 ` Eric W. Biederman
2018-06-06 11:13 ` Michal Hocko
2018-06-07 11:42 ` Eric W. Biederman
2018-06-07 12:19 ` Michal Hocko
2018-06-01 14:53 ` [RFC][PATCH 2/2] memcgl: Remove dead code now that all tasks of an mm share a memcg Eric W. Biederman
2018-06-01 14:07 ` [PATCH 0/2] mm->owner to mm->memcg fixes Michal Hocko
2018-05-24 21:17 ` Andrew Morton
2018-05-30 11:52 ` Michal Hocko
2018-05-31 17:43 ` Eric W. Biederman
2018-05-07 14:33 ` [PATCH] memcg: Replace mm->owner with mm->memcg Oleg Nesterov
2018-05-08 3:15 ` Eric W. Biederman
2018-05-09 14:40 ` Oleg Nesterov
2018-05-10 3:09 ` Eric W. Biederman
2018-05-10 4:03 ` [RFC][PATCH] cgroup: Don't mess with tasks in exec Eric W. Biederman
2018-05-10 12:15 ` Oleg Nesterov
2018-05-10 12:35 ` Tejun Heo
2018-05-10 12:38 ` [PATCH] memcg: Replace mm->owner with mm->memcg Oleg Nesterov
2018-05-04 11:07 ` Michal Hocko
2018-05-05 16:54 ` kbuild test robot
2018-05-07 23:18 ` Andrew Morton
2018-05-08 2:17 ` Eric W. Biederman
2018-05-09 21:00 ` Michal Hocko
2018-05-02 23:59 ` [RFC][PATCH] " Balbir Singh
2018-05-03 15:11 ` Eric W. Biederman
2018-05-04 4:59 ` Balbir Singh
2018-05-03 10:52 ` [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable Kirill Tkhai
2018-06-01 1:07 ` Eric W. Biederman
2018-06-01 13:57 ` Michal Hocko
2018-06-01 14:32 ` Eric W. Biederman
2018-06-01 15:02 ` Michal Hocko
2018-06-01 15:25 ` Eric W. Biederman
2018-06-04 6:54 ` Michal Hocko
2018-06-04 14:31 ` Eric W. Biederman
2018-06-05 8:15 ` Michal Hocko
2018-06-05 8:48 ` Kirill Tkhai
2018-06-05 15:36 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wovu889o.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=dhowells@redhat.com \
--cc=gs051095@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hoeun.ryu@gmail.com \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcos.souza.org@gmail.com \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=pasha.tatashin@oracle.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.