From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751326AbdJBMrw (ORCPT ); Mon, 2 Oct 2017 08:47:52 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:54538 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751062AbdJBMrq (ORCPT ); Mon, 2 Oct 2017 08:47:46 -0400 Date: Mon, 2 Oct 2017 13:47:12 +0100 From: Roman Gushchin To: Michal Hocko CC: Shakeel Butt , Tim Hockin , Johannes Weiner , Tejun Heo , , David Rientjes , Linux MM , Vladimir Davydov , Tetsuo Handa , Andrew Morton , Cgroups , , "linux-kernel@vger.kernel.org" Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20171002124712.GA17638@castle.DHCP.thefacebook.com> References: <20170926121300.GB23139@castle.dhcp.TheFacebook.com> <20170926133040.uupv3ibkt3jtbotf@dhcp22.suse.cz> <20170926172610.GA26694@cmpxchg.org> <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz> <20170927162300.GA5623@castle.DHCP.thefacebook.com> <20171002122434.llbaarb6yw3o3mx3@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20171002122434.llbaarb6yw3o3mx3@dhcp22.suse.cz> User-Agent: Mutt/1.9.0 (2017-09-02) X-Originating-IP: [2620:10d:c092:200::1:445b] X-ClientProxiedBy: DB6PR07CA0169.eurprd07.prod.outlook.com (2603:10a6:6:43::23) To DM3PR15MB1081.namprd15.prod.outlook.com (2603:10b6:0:12::7) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 65083aed-484c-4d50-831a-08d50993bb74 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254152)(2017052603199)(201703131423075)(201703031133081)(201702281549075);SRVR:DM3PR15MB1081; X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;3:QXbxzMfP9VrTUU/JbqT58FKETgmxEbqqPPVLFD6TCzu7HoQ7MvAL+c6fLAcaiJa/3DmbUiqE4l4BKuHJrwEQEBz9oHi/H6dckfnKFVJo/KkSEkXjueG11uAerVM7Apji7e9JLd+6Xlv/EJ33+iFdIVwVGkafJoy98YBu7AkxDmsNbKxcP36bujdjeE+gicvTHrXsorF6Lpw/lpUX0/ZAf8J5FVwADqeKVSot0SUzJDve8U43LadGg3P+5d3XIJYH;25:dbsxhnsihc8WiysDRFWxgsggt7TNAWnBBTdqOKnIzsWFA/Aqygd/zowHG08RfMrHVQBQnnSVLvul47o5YlBsx2ianenIamOoEMjHELPApYLX49qzBj4+noSrhjX/+a8E6Zg53MYGYByvBS15fdbfqQ3fdSG1jSskDch7F+h4OuoN17/PGeMk5sp9yW0P4zwSxxjROkxlsMqFcxgKHZSYd5HoHuqOhHjJ/S9o1gjoUOAaRWJjvtLl+4KFilWy+3ehEbh3u+OVhOfxwZTN4/l/OPVYrdA/T/luLOJYFUXPgF4gjRLOmAeh+cY6mk/RBdydE78kQvft6rcg2WUS4r1q+w==;31:4FI6tWOIVPXyTzYcaRO/f1BcE1VqXnnZaaEjWm3O2CPnHjXbl9j6klLHMn8HLz9kX2PEHsY1v6+3p7OsUg7aU/L/fCZgq6ENt8VPkHJs1D21TcQodp56wwTU1dcn4UJuCiJkaJeW54kDJAsDtwvOqTs/gfIEAZa5eWmWjBYjtR/q8+uWricWYu+g5X2iXFVAx5G4QJQsxbg+wndpOxiGg+zZogCz9bf0MKw77oclBv8= X-MS-TrafficTypeDiagnostic: DM3PR15MB1081: X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;20:T0AsvZwpAGNqNR98Sqn4JvYAjDgtjuPLi7xDQAlMfGCoUjuhC3uLcumFG6Og5ETqizTnft1ekJWbZBW+zZ969lU/7p0wlnI0gCab1Lue3sFf9fOZj4K2lQnjxlBnMvfmBAJIcG9SGZsrkCdZfyLKt4HbzPqcyexj9NTHcAnzHSeeSYGYptD0Ybgn2ug/Wu5W/RA3169qoGJ/vEC523nLCfEs6pVFVdfvXaSzt/c+ofTe1jE583TlcG2etgrXuE3Vd8oBINmUK7TabLK83PyG+zRSIDea4/xC3dteLDsyz/VTtLzlGtKKu9qSm8l18VRG+GC93vMB7/gRPR2YEio5T4qKPFiySAeCxBd4mTV1tZvadRXoDw2qPnZb+NMF20edbe9RJA3JG7KwM2cXCUDtpVkf5ezUNxGy77mx8YbJs+Y/iOKovmP1DFfKHJuKzUS+RSTpNs70LfJUyfkQAicjdHDQHMnzp56NGa5MAUutY6bjuBX8uJAsvsczYIpS0pWz;4:diQoZsd8c6GJBTzejf8gqVgyNxIPvRQJH4iBu7463/Im6StDqr2UStH3vKqD+1nIp7XuJ9ii30HKmIa/KHJNS5tjZi7nkwfkFyoTqi8siHg9ubx9L4U4rrweYfesDh4mYU3RopP+rycm+P6cJGDkMxATPZ8uBmWTgKZFuLUy12aLngLqMpMhXEtyPjj+IPnTrkn/PDtgTHUARW6lwnnJbzcwfE1eCNlMm6qpKAocIhQ6hJKOc2x/ZIT5vLiifgyrUF7w3W6FyzihaA69jfKxK09NqrxDZLlcdUfCAmDx4Fs= X-Exchange-Antispam-Report-Test: UriScan:(278428928389397); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(10201501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123558100)(20161123564025)(20161123562025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DM3PR15MB1081;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DM3PR15MB1081; X-Forefront-PRVS: 0448A97BF2 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(346002)(376002)(24454002)(189002)(199003)(377424004)(50466002)(25786009)(6116002)(5660300001)(1076002)(6506006)(9686003)(83506001)(47776003)(316002)(229853002)(16586007)(53936002)(58126008)(2906002)(97736004)(54906003)(7416002)(86362001)(6246003)(305945005)(6666003)(50986999)(4326008)(76176999)(54356999)(7736002)(478600001)(39060400002)(33656002)(189998001)(5890100001)(2950100002)(8676002)(8936002)(81166006)(81156014)(55016002)(23726003)(93886005)(105586002)(68736007)(101416001)(6916009)(106356001)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM3PR15MB1081;H:castle.DHCP.thefacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM3PR15MB1081;23:qlmovoN0iDzEBybEYTOPi4OxfDQ/7QxZxhHF5gUVU?= =?us-ascii?Q?PyWs5u1Of26BEY+3GlC0zTyKuQGf5YJSPDPcxWiDDrPOM+6iI767F+XAHig3?= =?us-ascii?Q?TLte5GySzAzYg5/mPuO4lLYPQ5nTSFCIvof0ufVvGUEx9v5CPABFNVxETcos?= =?us-ascii?Q?XqCVx3/c1cijSBawwphe4AoO7CVCxueuAoGNFko6AXU3l5heQcea1hwCnenL?= =?us-ascii?Q?iFLL8WqsqMzHYC9r6gncJi4zYYAGvRCrwD3heXsI3rAeHt5rDnXjjPraDvPV?= =?us-ascii?Q?VKz3OXZ7aP7JPAvbIAGMyt3bQZEVHcc75AgYdi4uxImrwViLRsdo1ta4vbuB?= =?us-ascii?Q?g/pG2QSmV5DopV5hgp4zdjAIxLreLn1jwgpefYqWf++DaPReWQN2yrzRdFyT?= =?us-ascii?Q?NnNcS46jnkMPuamkO0XYKyR5B4UYNYlbBzOXUt2Na9jVy45wcKpPvqxYYVbT?= =?us-ascii?Q?jc3U1x8BRb6QVNfsAzkrbjAmVq4n8vfWi2HpNST/Sju4s0jwibAik52xFzJt?= =?us-ascii?Q?x/xU+Ok+hElMjqXmMKM3CUHuxQs32wZRYDcOsyuJvIAXBI5hcxfdZJy67Kz4?= =?us-ascii?Q?kYyju3Oxnrinf+exa24QP/HBxtpSr6vjbtz2t2iDU5f+1ZFm/f0/V9xudQHJ?= =?us-ascii?Q?TFFGtdewmJIKPLU4fbCXkT+h6NbuKh5t3Y16uPowSGrsTqw1/ew/W5rNIs+O?= =?us-ascii?Q?ogfEvZHa+Kcg8OrlnLaCD9G6YgIKpcXbOnOYxMs3jG9CwLNAbyD4wFzOQosA?= =?us-ascii?Q?tPNIOUVH7V9waq2ZiOJnI7ECCQ2jOLxHS+t+nKjt/P5Lcqg8fw9p++T6n/Ek?= =?us-ascii?Q?lLMqtz1v5L52ZVBaQpkIqKet98nONe8wR+czsJm02BDggRNl/wP0HJVK3ML+?= =?us-ascii?Q?hn7Y6Ye5IUDtsU/NsPsYBrcA7sCTszv6osKYubiL1MV8aDBtzmu33f9wzmFF?= =?us-ascii?Q?IWeW4U/LbYgr2iYljiig94TZQywQlSziVM6IlUnZMibSmJmLVD32vvjyDyxB?= =?us-ascii?Q?1xbefJTe9oGbQaqDpBlvuJrRMo2YtvbSWTcS0m2zqH0SlTI7/bYlq0ZoaxDF?= =?us-ascii?Q?uhgxOg4PntOmLBmaix+1uJfWD5fzZ6WPIzx0IQiSc2gda28LCFAAfg9Jbvaw?= =?us-ascii?Q?AMgV1zOs+YLWM0QUv1BqJDdNrLPYoPj57TSh8GDZqtRNwuek6ECij4wy0r35?= =?us-ascii?Q?V47NUm+8tE9HYUPlzHhja27mOlY0bpR0DRi6V+evk7CaB18Afa2tb291vBLp?= =?us-ascii?Q?2wjvc1sxnfh/HcC1+PXLEDRNiSr1JS8oUv+jGvC?= X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;6:5DyWdLbXrJc8tN8RnzZUMfUTP0lMDmoUIG7SF227jQ7XfVgk4HgxxS8ZUZ8CLZdgdQHx69pTcH7qoOy11jY49urPsGrIx1Sv0KkbxY7pK2uGWmMIAqxOj2I0da+/2RZy2HmsXP6bY70jsNZkDWM85R3GNBOn2LTAy6d8Jc6e6nFEiJPyfFYTL3GnasZkkw1pr/wjmsT0rjJJMMZdMJb8vCUQUW/mH4PsHcHn6qjxA4aHI11TnC27ssX77Zdl/m+UjRQ+aDLVyhkSHljnvpcZgsVZSr4t7O/4OmCRMLlU0ZXxxp9AAiQcSYlI6nddW/7m+1nGk0rUnUCWwu95rsuBoQ==;5:Ha5phawdXjMDfJ2Jr1HZTY8FxjcR2/5w0Sil6LUzd2vuxOVZtQNtPwUuAroAI10UPp/71VELYHpOyPG3ZZCtZsD50oDYMa1MH1IATnpLA2SRtzxmxStRByPjT6Kk9GUbuqjyyBQaUC6v+yVzIdI8Ekta1zlqHVnpcXIE85A7+4Q=;24:eBbMTXp/SMzxhVdJALOy2veQooAN7Fs4ZyuV3rVMSUyqq+6v5ZQ7bt/L+BqnuIJRM2cO/6lRHWBbacTOSBe/EiiPaT6X1te4Mb0o27+7VtM=;7:UIhGYOLTAiFMGk72eiDOCW7KwDDGp7Z4YPlAo7w0VpGYCpu+PmIG9iE5xRn+VQ1RYHWRhOzDBJavTt95bjjFHRSI0wF31k6ZitKVTQ6v3xcmIGcsZgE5JqebWDNnGwdBw6Zz6a6i5BSnggre63jikzlljCcF/i8pWVZeQSGNqWFXso7AYU43S9UPvuAEXmNt27S6pLSCvoYEkRxjLXLTvwaKcT2iDofPZGbOf8gQ6qs= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;20:YEYgoEeyT7bADgL6pSbH350v+rD+khjlYKJ1j0BHkCIsb4BOKooI1iOm1RyXHq31HnKu01ZHeZGwuYrFxthQazoAwZpHQ8LkPu1Fr0pZ1lXEVESk8HLYOCfndFP6qfqrSPcP0x0NGhEl7O0E1CC/MchHUh4SoN/hJJnzpGTTwvA= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Oct 2017 12:47:23.1985 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3PR15MB1081 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-02_03:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 02, 2017 at 02:24:34PM +0200, Michal Hocko wrote: > On Sun 01-10-17 16:29:48, Shakeel Butt wrote: > > > > > > Going back to Michal's example, say the user configured the following: > > > > > > root > > > / \ > > > A D > > > / \ > > > B C > > > > > > A global OOM event happens and we find this: > > > - A > D > > > - B, C, D are oomgroups > > > > > > What the user is telling us is that B, C, and D are compound memory > > > consumers. They cannot be divided into their task parts from a memory > > > point of view. > > > > > > However, the user doesn't say the same for A: the A subtree summarizes > > > and controls aggregate consumption of B and C, but without groupoom > > > set on A, the user says that A is in fact divisible into independent > > > memory consumers B and C. > > > > > > If we don't have to kill all of A, but we'd have to kill all of D, > > > does it make sense to compare the two? > > > > > > > I think Tim has given very clear explanation why comparing A & D makes > > perfect sense. However I think the above example, a single user system > > where a user has designed and created the whole hierarchy and then > > attaches different jobs/applications to different nodes in this > > hierarchy, is also a valid scenario. > > Yes and nobody is disputing that, really. I guess the main disconnect > here is that different people want to have more detailed control over > the victim selection while the patchset tries to handle the most > simplistic scenario when a no userspace control over the selection is > required. And I would claim that this will be a last majority of setups > and we should address it first. > > A more fine grained control needs some more thinking to come up with a > sensible and long term sustainable API. Just look back and see at the > oom_score_adj story and how it ended up unusable in the end (well apart > from never/always kill corner cases). Let's not repeat that again now. > > I strongly believe that we can come up with something - be it priority > based, BFP based or module based selection. But let's start simple with > the most basic scenario first with a most sensible semantic implemented. Totally agree. > I believe the latest version (v9) looks sensible from the semantic point > of view and we should focus on making it into a mergeable shape. The only thing is that after some additional thinking I don't think anymore that implicit propagation of oom_group is a good idea. Let me explain: assume we have memcg A with memory.max and memory.oom_group set, and nested memcg A/B with memory.max set. Let's imagine we have an OOM event if A/B. What is an expected system behavior? We have OOM scoped to A/B, and any action should be also scoped to A/B. We really shouldn't touch processes which are not belonging to A/B. That means we should either kill the biggest process in A/B, either all processes in A/B. It's natural to make A/B/memory.oom_group responsible for this decision. It's strange to make the depend on A/memory.oom_group, IMO. It really makes no sense, and makes oom_group knob really hard to describe. Also, after some off-list discussion, we've realized that memory.oom_knob should be delegatable. The workload should have control over it to express dependency between processes. Thanks!