From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751681AbdHCMtA (ORCPT ); Thu, 3 Aug 2017 08:49:00 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:36329 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751058AbdHCMs5 (ORCPT ); Thu, 3 Aug 2017 08:48:57 -0400 Date: Thu, 3 Aug 2017 13:47:51 +0100 From: Roman Gushchin To: Michal Hocko CC: , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , David Rientjes , Tejun Heo , , , , Subject: Re: [v4 2/4] mm, oom: cgroup-aware OOM killer Message-ID: <20170803124751.GA24563@castle.dhcp.TheFacebook.com> References: <20170726132718.14806-1-guro@fb.com> <20170726132718.14806-3-guro@fb.com> <20170801145435.GN15774@dhcp22.suse.cz> <20170801152548.GA29502@castle.dhcp.TheFacebook.com> <20170801170302.GB15518@dhcp22.suse.cz> <20170801181352.GA26074@castle.DHCP.thefacebook.com> <20170802072900.GA2524@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170802072900.GA2524@dhcp22.suse.cz> User-Agent: Mutt/1.8.3 (2017-05-23) X-Originating-IP: [2620:10d:c092:200::1:fee4] X-ClientProxiedBy: DB6PR0202CA0044.eurprd02.prod.outlook.com (2603:10a6:4:a5::30) To CO1PR15MB1077.namprd15.prod.outlook.com (2a01:111:e400:7b66::7) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a5f2c1e8-0637-4bc4-d0c2-08d4da6de1ab X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:CO1PR15MB1077; X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1077;3:AQyWO8+9Kgp4uB2KM/jo1o/Iv6oihBrzf8SWYSSwLWKL7pEUIxx/fHA6QOTSWGHfRkRNtVuc5t7Slw2hYRhLXFp2P2oXx+7Muw121OOM1jgdVFvgb4fEFomvhNkuBLzk79P6eyQk+5BeZ3LXWqImMwbTTBxuFadOcYOLnkqbhoflf31IKT45GH9iZn2n/wutDC5hzE9cA8rUxMN53DbZrp8MZTavaKMF1OVYwVkfNurAHksEQOk5cgYweZifY7NA;25:DRB0mhuEtmn4nZAlPIFrapFCi+e53GiABfjNWhn085mI/Px9i48iqWjru/scIPlv0iB5xLTaEUzNJhCS7bvbpADi889kKFHSG0y7WQjraKCgIACT9e2DelfZFHjwB6fn9+8pjsEj1T+M836ALF5tVx+jgzG5Lczcq/tnaVKbowLISoyOkzd77Evh7IqiFwtMG/b6/iHrokjtWGjK1VIZsm8CH9YZmjFfr1DwMYJmhxpTRM5EYBGWd5U6qBHw4P5SGUv6OLoBsAm9+4QP+z+m4oBW7dZZ6e06DjBU01p2IJhRfXGIxf/g8FTxM72upMG5xwagNLXBPUmZRMJqimRlOQ==;31:h3gDaXSEaIVF+4BbXaecTWxoXQrlz/aqoEc92cNeR1uzyz6JYl4dzsJQGTPM7uRFFuTDX2z9oUO2WI1cQhPZdy5AEFS+qASyfM9pGdPSDJ2RiWehGJOKHxu6+QGVM0uS7rYgB+JzVkD6QfK3EapqOfFum7Tch9YYd+PzeV0iWYDUpr17tmDWXgWKm79FF6ybnTpuKQZxrfa0TJ8Up6Zra21o67STUICmfpV40UgGXmg= X-MS-TrafficTypeDiagnostic: CO1PR15MB1077: X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1077;20:JS2kx7bI+wfEIrAJy15n1cSOQWPvCwBsnfj8ECh7u7ILaGelFO/undwRdChnq9xjocQL6bgwh8YjQ5FnZXgJ8NiAuf/6wU7uqgxu2vOP7blZtdEiSI8mmswdBC+4OJyZui1QIoXr747tr9JHo+cQJ3rN9P2UNI3XdJdd2+lgLgW6gELIxyx+kZLwuIIq5vlj4xHVXXkyjZ66dZkbtLx87qyiNSj5X/d2Vl4y7CL9o460qT+FhKDVXHBfEp/4gZSckohPNefat0wBOlcsBaNRdN/IA4WcpPqsU9DTg1cT6LT4mjigHiexEl1XqRmQ9qGF4cMBzkQMX8jDJ2mwunuhvYwW0blEH/cYsWYebEmXnUowxKUxedyoqQVFbYdueBDHzFJ36YvYJ7/wiX0HnFbhKEQFk6TsZ1HFPWWGKVvLl+mYWD5CTgdW3cKm9x5+bPpjxkPSrrB66twG4XbXP3/mX6wQmD0/4GVeH88R3HWHCJDEBMnyL3iD5karMira/wO+;4:0MsRbyoJqZxfgocJeGzsHYuPZQYyOyXYp6ToBqPch+eYogWFtr2QwXKlX2T2b46ZvT2tkUbsTee4PVi76EUVpbYvobrF5XRCAbWff6SAtuciZs+rW4l8iIBYCkOdz7qgosCGLbFHiJ7tqaWmnXDbtD1H3TT2P/GGgTOs359ydaOcguy2hHc3onGwYQF0rkLJMrCvsYjNHr11X+0fnl3AVwbquNiKzXXBEy775JgrM+zqBJTojyrkBOZf86UM2iMv X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(6041248)(20161123562025)(20161123558100)(20161123555025)(20161123560025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CO1PR15MB1077;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CO1PR15MB1077; X-Forefront-PRVS: 03883BD916 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(7370300001)(4630300001)(6009001)(39450400003)(39410400002)(39400400002)(39850400002)(39840400002)(377424004)(199003)(24454002)(189002)(7350300001)(50466002)(4326008)(478600001)(7736002)(2906002)(305945005)(93886004)(54906002)(55016002)(68736007)(1076002)(105586002)(9686003)(42186005)(5660300001)(23726003)(106356001)(7416002)(6116002)(47776003)(6506006)(81166006)(53936002)(81156014)(86362001)(38730400002)(101416001)(110136004)(54356999)(50986999)(76176999)(6246003)(8676002)(229853002)(189998001)(97736004)(33656002)(25786009)(6666003)(4001350100001)(83506001)(2950100002)(6916009)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:CO1PR15MB1077;H:castle.dhcp.TheFacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CO1PR15MB1077;23:a1AwhJNnEAiC9dJ1POecBCw7HZYYkFfcZeDAvWtHa?= =?us-ascii?Q?vvvOkhEngEDgnUV4OjXcPpWg7+sAKFIzNgR5rpcXOEGo3jt4+2qZaSzkYzOW?= =?us-ascii?Q?Mo6mwQgCmTCPps0Fa6giPDlGH67RhAcV2yZ/zVotRGMVXqKaBqDz3GSFZGLX?= =?us-ascii?Q?Tfk6iep3c1kIANgfiCtppiWfZaecCHwDq8qCBhyXgbzo00zSTaKrxdrYA+xo?= =?us-ascii?Q?I0yxYpEuOtgpwGQVva9CdLwXuG1BDJ0VLGQ7c0lBOWZwJQp5hQqFFbKhgl/I?= =?us-ascii?Q?h6f8wwOH9cxWf7TL5nFrxgRNGvdZiXLGX5J+1mXlqbMMiTao65xAX3/Y6UqJ?= =?us-ascii?Q?eX1Wm8BSMWwqvB5RVkpcsABW9AeEmFhTU1OX592R5tDCs+jcBLjx9+T5m6ow?= =?us-ascii?Q?fVvGJe1v3zpZ88qdFpq17Q0iuouUJ9zhTsWRPUUmi4k1VIu9RWHa7guc6ACM?= =?us-ascii?Q?uaEEexABtm6pVrEBwEn3f+K1rallJ3BiI2xzQnUEnbdgnTvT1OTqpLzxKsyG?= =?us-ascii?Q?sWewIb63n+sPZOtm2RpgK7HQnPfyyEV+ybVKWGZlUitiOfJuaNGx2kicHQ32?= =?us-ascii?Q?mVeKlZsEcn3+4Q3DwMF/ZJXKB7JXYPsZ674SljahK6WYls6y+iwIrawrBLRh?= =?us-ascii?Q?bmaOqk+F/Kg26ohqKF+D4L84Lzv0IjuyKr+AVOT3whvwoPPTvxUYB2FXRbfR?= =?us-ascii?Q?WVXymshf7+vImFqeggMOioF6c09BumC4HD012fuDWciq6Fc1AvbGrOkphn0F?= =?us-ascii?Q?yJ77IUJHeGctJru37XAFA+HtpYfrvy61L4yt3emrOalX6Hm3imgMgb9WpccN?= =?us-ascii?Q?vSghStCSoYxhDKdVarNLwXpdfTllw+dZQd9LsqqrEAFL2uu25iJXTtFcdEJd?= =?us-ascii?Q?GJufGMyimM0jG/ZDE3e4TtYGfZSWa+ItAKo50fYtEy7fFXhZAuswLjm5XtJ2?= =?us-ascii?Q?vyoSPYxj+m4IgPRUE5+U75y2WsUQ8oI1ZFSCdegyJBAmSCJpdFVrHWADqAiX?= =?us-ascii?Q?mjdftEeYzQ88j6Iy9W7Z2ps5v3fHj6pVCcRHTDmlf3FIbaF9WS/vqQAVFKpt?= =?us-ascii?Q?Hv3qeuRVsrbOphMiFCRky2lUeXxkIdwzpss81wt3oBUuYpVph3W8rWtWRjxo?= =?us-ascii?Q?JAVWvfpPYWweSV7L+tbJIMTA4sWntnqarYnkg0AwDPjC6DFogHWHH3/QMcw+?= =?us-ascii?Q?DAwwVm+sHfuvp1kmXWPLBflQ3b79xET/Jhvg8g6dk30fV6Xxq79zTf8ZmZ4x?= =?us-ascii?Q?Sn1C/vw+ezCBenCJtaQ83Khnqp3IKjXCxb8/vt5rO8jKBWT+BhY+rMpiVfMo?= =?us-ascii?Q?V2cAt7QS8nDVBOiaOR6FvbRs3SFqkedLSgMHrh9WExymjN6w7DT2j9ciL4sU?= =?us-ascii?Q?tgDriKXTp+pX9UBKYP/0RFFvMqfNHI/HSXPLXfsYmHsR0JiienIL7pNjL6gd?= =?us-ascii?Q?u4tdc+Law=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1077;6:kpNt9VqlAyAYnpruZFc522x4n3jXbqyLJZhMpP874nj18oAHKKCv24R1GoRP5XeAUPLm1ZwlX4XGkHv2n6a7odrIS47EVLc1NcsY3Bl0Ep48iDIspvZ/iu1/MLyFamJqL54llMjiCpUGzJN5k3LV1WB+ROa6AGE6DrX4nY/tvl/Gz0NxhHAd/zjlk5mGOQnqHThePNfc24J8nHr2WPjycduID7140X/cB94+pjzntpKHyD2r317mXxeHtGqi9WmgXJnRojcJP6FvPjabbgYVgVgN+kHEGknL19dRUA2gnVwUgIuMp+ampy+gyavNyRydcXeLUApADKRU8rQWiD4AMw==;5:thwx/l8VznSfDkDpoXuBSD4R7UInr+UjIXZ9zXQSp88KVa7y0CmmPj5JaNNw2uuLOEECJzjo0oWiojrxT7Km+31ya00M3JkXzbNe3z3/gPoHBvgUlz6yUEhucptxg8FGZfnnlOh5YZykn6Ues2GjBA==;24:nZXsJOmBk/j6Ui3aCUt4Nb1YeNcfYksQW+7P3uJ+7eWCX3QYZdF5nqL31E8WQlb9IgLaqMkXcBvy7hlepqC1AtWlXxrJwhFjxPSHstCkYw8=;7:FTeWPlaSC/dOeNM8DtiasM/Y8qu13eaU8D9Z2zrNiTbQlclVUjaVmgDvaBT3c1y+3LHGBkSGqa0440zrNMU+lcFfdeVytTgb6EHOc0JhnSZTV9bwFROUBMwh6oMdIc3VbiMCkuTjiqlgdw00wpLT+GafUoXwhYaYpVZNG/4TbpwXqrtxh6Jniamj4zgzVQzkssAEXXS5315VpbsSVt4ExVGUMklsGt3LflMXWU+KXHY= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1077;20:vfmDUeTY8iY5QlH80WfQ3jJi46aqR1TKAQRAz5Hj/mD4+8t0DmRSxT5o9cpYLXcTbQjQkMsHmyhOg7+XLfwmPL9unrrVbtJsdOwN6DtARZSKsIX3+Qb9UOOiv3Mh1YOuDXC2Wr9x3nP47jMFoD83P48gtiQSAhXFA64oLx2Gy74= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Aug 2017 12:48:01.1474 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1PR15MB1077 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-03_06:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 02, 2017 at 09:29:01AM +0200, Michal Hocko wrote: > On Tue 01-08-17 19:13:52, Roman Gushchin wrote: > > On Tue, Aug 01, 2017 at 07:03:03PM +0200, Michal Hocko wrote: > > > On Tue 01-08-17 16:25:48, Roman Gushchin wrote: > > > > On Tue, Aug 01, 2017 at 04:54:35PM +0200, Michal Hocko wrote: > > > [...] > > > > > I would reap out the oom_kill_process into a separate patch. > > > > > > > > It was a separate patch, I've merged it based on Vladimir's feedback. > > > > No problems, I can divide it back. > > > > > > It would make the review slightly more easier > > > > > > > > > > -static void oom_kill_process(struct oom_control *oc, const char *message) > > > > > > +static void __oom_kill_process(struct task_struct *victim) > > > > > > > > > > To the rest of the patch. I have to say I do not quite like how it is > > > > > implemented. I was hoping for something much simpler which would hook > > > > > into oom_evaluate_task. If a task belongs to a memcg with kill-all flag > > > > > then we would update the cumulative memcg badness (more specifically the > > > > > badness of the topmost parent with kill-all flag). Memcg will then > > > > > compete with existing self contained tasks (oom_badness will have to > > > > > tell whether points belong to a task or a memcg to allow the caller to > > > > > deal with it). But it shouldn't be much more complex than that. > > > > > > > > I'm not sure, it will be any simpler. Basically I'm doing the same: > > > > the difference is that you want to iterate over tasks and for each > > > > task traverse the memcg tree, update per-cgroup oom score and find > > > > the corresponding memcg(s) with the kill-all flag. I'm doing the opposite: > > > > traverse the cgroup tree, and for each leaf cgroup iterate over processes. > > > > > > Yeah but this doesn't fit very well to the existing scheme so we would > > > need two different schemes which is not ideal from maint. point of view. > > > We also do not have to duplicate all the tricky checks we already do in > > > oom_evaluate_task. So I would prefer if we could try to hook there and > > > do the special handling there. > > > > I hope, that iterating over all tasks just to check if there are > > in-flight OOM victims might be optimized at some point. > > That means, we would be able to choose a victim much cheaper. > > It's not easy, but it feels as a right direction to go. > > You would have to count per each oom domain and that sounds quite > unfeasible to me. It's hard, but traversing the whole cgroup tree from bottom to top for each task is just not scalable. This is exactly why I've choosen a compromise right now: let's iterate over all tasks, but do it by iterating over the cgroup tree. > > > Also, adding new tricks to the oom_evaluate_task() will make the code > > even more hairy. Some of the existing tricks are useless for memcg selection. > > Not sure what you mean but oom_evaluate_task has been usable for both > global and memcg oom paths so far. I do not see any reason why this > shouldn't hold for a different oom killing strategy. Yes, but in both cases we've evaluated tasks, not cgroups. > > > > > Also, please note, that even without the kill-all flag the decision is made > > > > on per-cgroup level (except tasks in the root cgroup). > > > > > > Yeah and I am not sure this is a reasonable behavior. Why should we > > > consider memcgs which are not kill-all as a single entity? > > > > I think, it's reasonable to choose a cgroup/container to blow off based on > > the cgroup oom_priority/size (including hierarchical settings), and then > > kill one biggest or all tasks depending on cgroup settings. > > But that doesn't mean you have to treat even !kill-all memcgs like a > single entity. In fact we should compare killable entities which is > either a task or the whole memcg if configured that way. I believe it's absolutely valid user's intention to prioritize some cgroups over other, even if only one task should be killed in case of OOM. Thanks! Roman