From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756654AbdJJWEx (ORCPT <rfc822;w@1wt.eu>);
        Tue, 10 Oct 2017 18:04:53 -0400
Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:58274 "EHLO
        mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1755157AbdJJWEu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 10 Oct 2017 18:04:50 -0400
Date: Tue, 10 Oct 2017 23:04:17 +0100
From: Roman Gushchin <guro@fb.com>
To: David Rientjes <rientjes@google.com>
CC: <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
        Vladimir Davydov <vdavydov.dev@gmail.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
        Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org>,
        <kernel-team@fb.com>, <cgroups@vger.kernel.org>,
        <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [v11 3/6] mm, oom: cgroup-aware OOM killer
Message-ID: <20171010220417.GA8667@castle>
References: <20171005130454.5590-1-guro@fb.com>
 <20171005130454.5590-4-guro@fb.com>
 <alpine.DEB.2.10.1710091414260.59643@chino.kir.corp.google.com>
 <20171010122306.GA11653@castle.DHCP.thefacebook.com>
 <alpine.DEB.2.10.1710101345370.28262@chino.kir.corp.google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.10.1710101345370.28262@chino.kir.corp.google.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Originating-IP: [2620:10d:c092:180::1:d351]
X-ClientProxiedBy: AM5P194CA0024.EURP194.PROD.OUTLOOK.COM
 (2603:10a6:203:8f::34) To BL2PR15MB1073.namprd15.prod.outlook.com
 (2603:10b6:201:17::7)
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: b880c392-2709-43f8-dacc-08d5102ae1d8
X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254152)(2017052603199)(201703131423075)(201703031133081)(201702281549075);SRVR:BL2PR15MB1073;
X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1073;3:aZV59SD8cPrfX3G3ACa9hYV7+L6XmvBz3OXSnHezWN+2WGrjmQMR9MxwTWBIEBle0jf8r9M/ov3mob24kMEBqj/wxx2ZhQXHGhaF9S3jABzaFHeoPYIdYMBEKeffu9q2ShWfRrDxVWVjl9KpRXDCXOkdwWh0Y4p73+5Jbzv+QtCG6k3+kjFU2YzprEDyVRWurODBUhN1nMecuiT2bfC31OAfniKwqKBXqIHbKUK6KGSxiDT7vQ6pXN82Fs/ixqdC;25:6Xu9oG//PC3jZaJzLUl7jQf5LKeoZYUaoRrpNIM5p8YkDLNnXdzCuZOaodoTwwm4iFTGJVaE+dkHww0QacYZRP2MV/oHQLIWQtId6bfSS/gPl+DbGWtepy4l/M5HCyrFGkXbHviWdUPUiKZTznljMEJ+jrt6AQr/HzHhSQjjLJJ7HHzmG2bdJcVzUD9bSnqd6eTVMLxdmG4wIzuC5Lcvs12mLNYnfkLrjApCF48LObYNdHE2vZjVMjjxQaXBN37LNoRiZLZcn/eC7LJ/5/HRJOaJCiQj0RHStfVUl8p3CsAvxQLrbke+oVfjSMBR3DM/y2+KDn3Yu1dR4QMVRmpBIsgV3vb+ulH2FvhypOjX8Kk=;31:m+7wCns7msskBhIrAH47gVh5wuDjGgPshYL/IiU5OcfECWU9bnfEfD/NZ/CpLAKuL4nICQEud4PeeIMGAEkLW0FCTzz1Y6r/b76qrIIULJ0sm+s+5XUWSILpN/VexL2fZs+kRNnm/nI2KK/43OCNbXojrbY7TV9ZBDKtiERVlOLVeAMglsq+R5P2JcRkOIDK/cCmfG8x73qLGHDgXpqebjKpkKRGGdHAxjw+TjhPn24=
X-MS-TrafficTypeDiagnostic: BL2PR15MB1073:
X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1073;20:21eBWhs9Eclw00RrKlXAMYLs7jxC1u21PA40f+RLkZdngqOWYcXE4WeTaGtIbiUZSXaJg3VsiLwbpdUCMx4nDEWw2p07UzO/o8Mvt/vlM9CRnPGJq+e7Bz36B/OjW0tp1CYFsXyvMP/pKFZH0v1kmg15W4bmD1knsp5BAaF74l6BxWDd7ynSOb4XpdhaK/DlO+mrp3Fehtw9nyuD+jYKtbWHNgBfw4bweKqG7gA1NusGtmPGaNhCnerQ7tEX9tqv9iq3I8WS8K8BSnDGTaRcyzbFheQ1qMHMpqYgX1fQmrH3Xy1HhAIKikQ19RdkSkhZWvE1NvRsNGgj/f3fK1Sr7rHp++2KlpwYFgaRIn4zyOuQHbQQnbteGcF518VyA6Y2oCPyNX+blaTHTxW5+DxOZ+zOdeoP4VhupKRmgp9yKggO2gESkuPiTSwwJR62S/e3IyO6LKQB8GfJvbeoimL0za8FbLSTF6piduiLg/Fh5A19jR4HaHklTNcantJD/xCj;4:Y3Dck3MB2j+DmOJOzq4LTWkgsuwtKzWSJrZfv34wM+/oQfPU+cNifJCNfIrxKgOCds9ggxEICyfzBI9mrOJupI/3Xek2oIh9Jbj0k+N0lszsi9Hbf7nTUmWt4SgZNmjCORAGCMdxYawlDFGn6nJfiHDQqgpLQSwYf2O0GtzRoNFF1jRyQM7sA/of4u1NRMHViyFDyPK4k7tfciKbPQN+02chhldAH5Xyr3xTEkDo9S+l5WZsV71bOATHn2GOJVLqsgn9BnhQAEGc6j7Vsa9LFEXh5yLm43OdxjvDYRrMwdZ+UugICYjziPnMbjwKWWev8DBp6XzPGH6gmEhSjs5TgQ==
X-Exchange-Antispam-Report-Test: UriScan:(211936372134217)(153496737603132);
X-Microsoft-Antispam-PRVS: <BL2PR15MB1073ECAD60462FC03381BEB4BE750@BL2PR15MB1073.namprd15.prod.outlook.com>
X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(11241501159)(6040450)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(3002001)(100000703101)(100105400095)(920507026)(6041248)(20161123560025)(20161123555025)(20161123558100)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BL2PR15MB1073;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BL2PR15MB1073;
X-Forefront-PRVS: 04569283F9
X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(376002)(346002)(189002)(24454002)(199003)(189998001)(8676002)(16586007)(1076002)(316002)(106356001)(58126008)(23726003)(6246003)(83506001)(33716001)(5660300001)(6496005)(76176999)(54356999)(478600001)(50986999)(93886005)(50466002)(68736007)(55016002)(6666003)(101416001)(97736004)(54906003)(2906002)(33656002)(81156014)(81166006)(53936002)(47776003)(86362001)(575784001)(105586002)(7416002)(6116002)(5890100001)(39060400002)(229853002)(6916009)(7736002)(9686003)(25786009)(34040400001)(2950100002)(4326008)(305945005)(8936002)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:BL2PR15MB1073;H:castle;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en;
X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BL2PR15MB1073;23:e/K0UBW8JDw/HLlRcGZd9LTF93YvkdmGZ9HWCL3o4?=
 =?us-ascii?Q?XCKZdOKTkIuaz2Wdcgt8+e0R+7hga3Rm1l0nOcig3nxHf4yrxuzVU7oNtKYe?=
 =?us-ascii?Q?0wgHBttiuP57v5SUYnWNNYMSFlCjCRsb8/+3RkxHxxB+5ObdUoXFbLDXDrut?=
 =?us-ascii?Q?2s1Zt15ei8p5HJCUiyqk4OVMdfOMHzhvI2CjeWrQBGM9FBXfF74RqPyy673o?=
 =?us-ascii?Q?kzaF/nbKadFy3hueGqBAnLi6f+DoRvXXkW5ra9A+6s38dz1fJfoZXR6VY7Q+?=
 =?us-ascii?Q?ZsNH66HXOCuKMJ3qYRT1YMqn8/4N5UxhAf2rhvx4sVftXOWhW2++0Gqt3kB9?=
 =?us-ascii?Q?V80mckMFBs5cZlgjiAbvpGBQk+2G94jon1Ekoz1Paq4X1Y0kTZEnvBm672tr?=
 =?us-ascii?Q?Jr2KFH+JKBh71buiONszUdEi5fA+X4UUmsXtGmW75t0hoRL4NvJDLthxa9lk?=
 =?us-ascii?Q?WNVBD+tdB10RgCvR92p0YmxeYRHBcZi6v2NRtuiALK/ekOr9tvACAX7TjJxt?=
 =?us-ascii?Q?EXDfGSCh0BUPv481+0Pn0q7pTEsboxLKw6CayNQK4+ccwQvAHs77iRVZByoY?=
 =?us-ascii?Q?nmME4TTzmCE26sQefDvwRL1F5GWRMiDAPxVeiCAG7xJDcGRfA7dInIY4Q4Dq?=
 =?us-ascii?Q?vQL5wz8+oglH+1KHE8bDjALRkzWdvXvHLu4NtnyB0oIhHLZIyCYwguZV9TjS?=
 =?us-ascii?Q?GzOrkIZMOHCm107MS9OdCfesGpgNNjTaQ9sxWy4tqvP5+DjjLGiElyVQsPB+?=
 =?us-ascii?Q?4DHvpzgu07gkasvUljoJnvZnzhcT1E8qZof/wuUKa6aL04pYtj5IGFiiyoaL?=
 =?us-ascii?Q?2Mc2MT860Yf60FPlYHhzi7HMQWuTdwz0Q+Xk7sSILyZCwXBREpx9KLahJJWT?=
 =?us-ascii?Q?3vgtCxTUzWA7h7tIBT3ghOnqkACV6cLX6v+DD7vod6Y20bLQtBtPLQTTq2VH?=
 =?us-ascii?Q?KWiapAnPFfz/0J+5NjonkkR5jAxq0kW/7P0xKjqd9RKDEzvcyfThvQiD4Z6S?=
 =?us-ascii?Q?iWx0ZpgNAtDTDTd0N4Wo5QF720LkLxJCCF7XnnHh/LggMR7gYnascyTm7wTo?=
 =?us-ascii?Q?TgikpZJkdl26wH0szOyBxz+YKBg3EL5RYELLdn2oiLIjq1z7GrczBPyQYdfs?=
 =?us-ascii?Q?Yy2WPvVCEFv3cNwjy9KH3glL6a0JYJfX2fJsOeYXE9qZbP/yngnuhNb4AUdX?=
 =?us-ascii?Q?RphkGwH/nKcyZllheeNrIL3IQv9GVGDBOZMMt947AStA/oNdyJhhl95jpwCU?=
 =?us-ascii?Q?cmuQzCaUTfT7LqVjs3BH75MGMulM1pw/aU4s8yWrBrZA5ZzAD6sd5GXK0+PD?=
 =?us-ascii?Q?ibdbA0GRNwHxgERznzwA5o=3D?=
X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1073;6:wDUV5vV2ONL/lKb7XQiSAjVJU7+6sQUt3ZdQMSRSrAmkSpQSLxJNRJf1/o8UclbSFqGoBBFTqxclZdINkUCf8MbWa/w1d/sztUFvtyhnrb95B4LsVbT0ekM1LYDmGUpXjf4qyDQ/n2GqSNx6smvRQ7t5ROVlV3Ka8dbyOpJR0bVlj1+nPw8YisEyU7m1XH1aOh7WV+l2nOTWpz7uVv3Pz0p0k34hqoYvl+8Iu0rQliHKA6eGNpkht3aGLIimS0SaFJ+nMsyIaAZi2GnpkZU2cyHo9A7iurkMS+8bVeyimTNgMqTa7em94qng3QS7W5BnzED1kUf22mNKyHwqM/J7NA==;5:5C6fbPTHi3gwYA0vWZ0Otskas7JxKXly5pYdrc+oZcdhd6mTha24Vj3MbOQVcTrwdPHprxd+5oasjs0WASAcqEicUNbfgcfN1ZqjHXIKPxBntjfyPOZJtEgtC47NJklyaG6ooxBsC3qVf2mEtnXzSQ==;24:jIucP3ghoYDVvCypCAdYVObCBC7VH/i9BSs+mfuMA9iS2anbBSNc5KAINoJHYFrC2ZDlDyZ9SM9atsCnGbfZF/MwgTNb1e6dsjsTql6Wdlg=;7:NLuPlmzP+WEE8YpdgumuF9IcYTcxzdy5R4pqEx/3v/rKwYGRwd4BN3gEQVp7s8yQTBqKn9g1x9ZIUW5Mo4w+Fy7+rmJpznen3gjjbAPTXIeCwXTCbyq/gVVOvt3G66a28eSRRxHL6hD02egRLpLr11cfssy8gtY7NRZ0+1wDPVnmtXrVbmhIiYUAWM5/y5qnJe+R2zucGRrxR9btvN6qM8ojs0myqSKFsQDFadBYFEE=
SpamDiagnosticOutput: 1:99
SpamDiagnosticMetadata: NSPM
X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1073;20:fwm8R9glR6cFuhNOhJ05h05H+08hkJBA3AbIcN40Hi+kfvJGPnaNbGn33rX0i3FW6muW2YlCLJ4CM3CjkmSxZRlW8fzWTrroJpU6peuNk8EVrtwJCWGMn9MNPGiNoWP1JqxuGbS4msLBzQD1vNtqKWJaQP8l80oUe59FaLr2/Vo=
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Oct 2017 22:04:28.7798 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR15MB1073
X-OriginatorOrg: fb.com
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-10_06:,,
 signatures=0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Oct 10, 2017 at 02:13:00PM -0700, David Rientjes wrote:
> On Tue, 10 Oct 2017, Roman Gushchin wrote:
> 
> > > This seems to unfairly bias the root mem cgroup depending on process size.  
> > > It isn't treated fairly as a leaf mem cgroup if they are being compared 
> > > based on different criteria: the root mem cgroup as (mostly) the largest 
> > > rss of a single process vs leaf mem cgroups as all anon, unevictable, and 
> > > unreclaimable slab pages charged to it by all processes.
> > > 
> > > I imagine a configuration where the root mem cgroup has 100 processes 
> > > attached each with rss of 80MB, compared to a leaf cgroup with 100 
> > > processes of 1MB rss each.  How does this logic prevent repeatedly oom 
> > > killing the processes of 1MB rss?
> > > 
> > > In this case, "the root cgroup is treated as a leaf memory cgroup" isn't 
> > > quite fair, it can simply hide large processes from being selected.  Users 
> > > who configure cgroups in a unified hierarchy for other resource 
> > > constraints are penalized for this choice even though the mem cgroup with 
> > > 100 processes of 1MB rss each may not be limited itself.
> > > 
> > > I think for this comparison to be fair, it requires accounting for the 
> > > root mem cgroup itself or for a different accounting methodology for leaf 
> > > memory cgroups.
> > 
> > This is basically a workaround, because we don't have necessary stats for root
> > memory cgroup. If we'll start gathering them at some point, we can change this
> > and treat root memcg exactly as other leaf cgroups.
> > 
> 
> I understand why it currently cannot be an apples vs apples comparison 
> without, as I suggest in the last paragraph, that the same accounting is 
> done for the root mem cgroup, which is intuitive if it is to be considered 
> on the same basis as leaf mem cgroups.
> 
> I understand for the design to work that leaf mem cgroups and the root mem 
> cgroup must be compared if processes can be attached to the root mem 
> cgroup.  My point is that it is currently completely unfair as I've 
> stated: you can have 10000 processes attached to the root mem cgroup with 
> rss of 80MB each and a leaf mem cgroup with 100 processes of 1MB rss each 
> and the oom killer is going to target the leaf mem cgroup as a result of 
> this apples vs oranges comparison.
> 
> In case it's not clear, the 10000 processes of 80MB rss each is the most 
> likely contributor to a system-wide oom kill.  Unfortunately, the 
> heuristic introduced by this patchset is broken wrt a fair comparison of 
> the root mem cgroup usage.
> 
> > Or, if someone will come with an idea of a better approximation, it can be
> > implemented as a separate enhancement on top of the initial implementation.
> > This is more than welcome.
> > 
> 
> We don't need a better approximation, we need a fair comparison.  The 
> heuristic that this patchset is implementing is based on the usage of 
> individual mem cgroups.  For the root mem cgroup to be considered 
> eligible, we need to understand its usage.  That usage is _not_ what is 
> implemented by this patchset, which is the largest rss of a single 
> attached process.  This, in fact, is not an "approximation" at all.  In 
> the example of 10000 processes attached with 80MB rss each, the usage of 
> the root mem cgroup is _not_ 80MB.

It's hard to imagine a "healthy" setup with 10000 process in the root
memory cgroup, and even if we kill 1 process we will still have 9999
remaining process. I agree with you at some point, but it's not
a real world example.

> 
> I'll restate that oom killing a process is a last resort for the kernel, 
> but it also must be able to make a smart decision.  Targeting dozens of 
> 1MB processes instead of 80MB processes because of a shortcoming in this 
> implementation is not the appropriate selection, it's the opposite of the 
> correct selection.
> 
> > > I'll reiterate what I did on the last version of the patchset: considering 
> > > only leaf memory cgroups easily allows users to defeat this heuristic and 
> > > bias against all of their memory usage up to the largest process size 
> > > amongst the set of processes attached.  If the user creates N child mem 
> > > cgroups for their N processes and attaches one process to each child, the 
> > > _only_ thing this achieved is to defeat your heuristic and prefer other 
> > > leaf cgroups simply because those other leaf cgroups did not do this.
> > > 
> > > Effectively:
> > > 
> > > for i in $(cat cgroup.procs); do mkdir $i; echo $i > $i/cgroup.procs; done
> > > 
> > > will radically shift the heuristic from a score of all anonymous + 
> > > unevictable memory for all processes to a score of the largest anonymous +
> > > unevictable memory for a single process.  There is no downside or 
> > > ramifaction for the end user in doing this.  When comparing cgroups based 
> > > on usage, it only makes sense to compare the hierarchical usage of that 
> > > cgroup so that attaching processes to descendants or splitting the 
> > > implementation of a process into several smaller individual processes does 
> > > not allow this heuristic to be defeated.
> > 
> > To all previously said words I can only add that cgroup v2 allows to limit
> > the amount of cgroups in the sub-tree:
> > 1a926e0bbab8 ("cgroup: implement hierarchy limits").
> > 
> 
> So the solution to 
> 
> for i in $(cat cgroup.procs); do mkdir $i; echo $i > $i/cgroup.procs; done
> 
> evading all oom kills for your mem cgroup is to limit the number of 
> cgroups that can be created by the user?  With a unified cgroup hierarchy, 
> that doesn't work well if I wanted to actually constrain these individual 
> processes to different resource limits like cpu usage.  In fact, the user 
> may not know it is effectively evading the oom killer entirely because it 
> has constrained the cpu of individual processes because its a side-effect 
> of this heuristic.
> 
> 
> You chose not to respond to my reiteration of userspace having absolutely 
> no control over victim selection with the new heuristic without setting 
> all processes to be oom disabled via /proc/pid/oom_score_adj.  If I have a 
> very important job that is running on a system that is really supposed to 
> use 80% of memory, I need to be able to specify that it should not be oom 
> killed based on user goals.  Setting all processes to be oom disabled in 
> the important mem cgroup to avoid being oom killed unless absolutely 
> necessary in a system oom condition is not a robust solution: (1) the mem 
> cgroup livelocks if it reaches its own mem cgroup limit and (2) the system 
> panic()'s if these preferred mem cgroups are the only consumers left on 
> the system.  With overcommit, both of these possibilities exist in the 
> wild and the problem is only a result of the implementation detail of this 
> patchset.
> 
> For these reasons: unfair comparison of root mem cgroup usage to bias 
> against that mem cgroup from oom kill in system oom conditions, the 
> ability of users to completely evade the oom killer by attaching all 
> processes to child cgroups either purposefully or unpurposefully, and the 
> inability of userspace to effectively control oom victim selection:
> 
> Nacked-by: David Rientjes <rientjes@google.com>

So, if we'll sum the oom_score of tasks belonging to the root memory cgroup,
will it fix the problem?

It might have some drawbacks as well (especially around oom_score_adj),
but it's doable, if we'll ignore tasks which are not owners of their's mm struct.

> 
> > > This is racy because mem_cgroup_select_oom_victim() found an eligible 
> > > oc->chosen_memcg that is not INFLIGHT_VICTIM with at least one eligible 
> > > process but mem_cgroup_scan_task(oc->chosen_memcg) did not.  It means if a 
> > > process cannot be killed because of oom_unkillable_task(), the only 
> > > eligible processes moved or exited, or the /proc/pid/oom_score_adj of the 
> > > eligible processes changed, we end up falling back to the complete 
> > > tasklist scan.  It would be better for oom_evaluate_memcg() to consider 
> > > oom_unkillable_task() and also retry in the case where 
> > > oom_kill_memcg_victim() returns NULL.
> > 
> > I agree with you here. The fallback to the existing mechanism is implemented
> > to be safe for sure, especially in a case of a global OOM. When we'll get
> > more confidence in cgroup-aware OOM killer reliability, we can change this
> > behavior. Personally, I would prefer to get rid of looking at all tasks just
> > to find a pre-existing OOM victim, but it might be quite tricky to implement.
> > 
> 
> I'm not sure what this has to do with confidence in this patchset's 
> reliability?  The race obviously exists: mem_cgroup_select_oom_victim() 
> found an eligible process in oc->chosen_memcg but it was either ineligible 
> later because of oom_unkillable_task(), it moved, or it exited.  It's a 
> race.  For users who opt-in to this new heuristic, they should not be 
> concerned with a process exiting and thus killing a completely unexpected 
> process from an unexpected memcg when it should be possible to retry and 
> select the correct victim.

Yes, I have to agree here.
Looks like we can't fallback to the original policy.

Thanks!