From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933548AbdERTVu (ORCPT ); Thu, 18 May 2017 15:21:50 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:53922 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753606AbdERTVr (ORCPT ); Thu, 18 May 2017 15:21:47 -0400 Authentication-Results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=fb.com; Date: Thu, 18 May 2017 20:20:50 +0100 From: Roman Gushchin To: Balbir Singh CC: Michal Hocko , Johannes Weiner , Tejun Heo , Li Zefan , Vladimir Davydov , Tetsuo Handa , , "cgroups@vger.kernel.org" , "open list:DOCUMENTATION" , "linux-kernel@vger.kernel.org" , linux-mm Subject: Re: [RFC PATCH] mm, oom: cgroup-aware OOM-killer Message-ID: <20170518192050.GA1648@castle> References: <1495124884-28974-1-git-send-email-guro@fb.com> <20170518173002.GC30148@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [2620:10d:c092:200::1:4ab8] X-ClientProxiedBy: AM5PR0201CA0007.eurprd02.prod.outlook.com (10.169.248.17) To DM3PR15MB1084.namprd15.prod.outlook.com (10.166.160.138) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ca2651a6-8b7d-48b6-1890-08d49e2308e5 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:DM3PR15MB1084; X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1084;3:sT+qlRpG4iOsqaMOXk2nbkU7p8rPR4/Rw2AyaLEu9YQlco7DoN3LCOHYEWFlcXQMC199HKwoo/ckrN32AkBvtcox2o99ChievuKjvNgfjiK099Mmf9icTeaJdYLq0P9b5UGcfqKI8L17HjtYuEuM7rL8opQ4K4fRNQMkxiykb4LWU+TEusbkSSpri1F/0sLWGWVy8OASbqsdL8oYBlbR+/ELF9Q5vNIJEAWdGDJm4Pu+lbjVOvPkglv+m48CXupWYmZfbbayF871CF2+LGfW8jXaXt1EU5tyJOfy3dEiM3qnEBrD6Y5aaVll1gNXIvUhkhkB8DVf2JY5EkD9LSBQcQ==;25:aXR7ncgR7WollAZQvypuQry+upBIRAvY0t2leTLJ4RmjP7N/VMwB4tqzNxv+vS8Yw3ORg4EAqE8PX1eNbpRLPSLYJ3lwwKRky21DLqr/API/jbzgrnY/M3Qdr5RAS8XpK3z80JRbg7lNqIe61lQGo/242yKwT6j+lU+elq1anW6KsEbrWnCWJ0fbid5AQI/t30K6M7w88DHF0/qQ01KG6ESi3tRsTH3RQdnHAdWk/BPw/7kdqpBSUG4+yFj3JQBncY8uz/jVMaFxz7AMqdjO/dyF/w3VXe0+Ur6tXjMNqFV/MmltriT+2lDyDu9X2bTgpnW8CNUArBNONh40wbtvCbTulYV9gZtZoiyjuQBY1qYhc062pvgRc73puHP0tyOKOC0eSpepW2BkbemReTrdPrP6zKbqhRZ779NSFR2+JstiH30ZDSK560ZZkG/NEmw0yUsqW9jDN8UVXfxjJwN6ZblaUECUh9DmkhcQhrmiziw= X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1084;31:XmS579PrwWQ2PleS+DU4oy1lgOCiI0Uot4LZE3J5CV+Cg72+p0Ypfgfuq0OKVYUrvliKCp1ugNwm4AAGB1jgvJKAUpG24XMCuztnnwyNJaCXG+LzJxvQmVCWvzMpa7Deb2ormXojHzVFepbRMvGZqFNkCyZYzqP/kvcZ0M0t+QjZHQWy8OKyAncnONZep7noyTBwM0lV4R/p/LAkt41qGikGxv0yOeLsW4ZFGxxV+mA=;20:PyiOKoOC7KB0TcsHb5a0+5ZefHCUvJm4iR+Suirf2wlSrBTK6e+CDhQp0pT9YtP6qO2ylqt3piy6GrIfnmIJxaco8dcG0gUaHu/vXLW+fZl+acYGsvhm2N8JuBfsRyDUSfuMr1UkeyrZKarbNkpmbFAID7/IJHSGO4BRmNYTSDkJ6J4qjPmUlsEmsbygrsF36NeG+NLAOhnQY0n/AWynLIWWGnxzn03pNX8AfdhS5EizL0ikl3+AXjVn8ukogAPAc/hJ0b8tj9P2sP2pbpu9dGerDl3E4cZbUqeNkaI3S85XOxhuS1Ld1dO1y9p/uLwDBW2oy/W0k7mmkDG3sj3pctZSwugEGy2G9tOt/SoWwXIwn/5oFXAjSgUtR2AjFcmme0Xq7s7nzWzT18PDnZ9xIM0mIai65TbQCFOHHtuSvWBS4zoEQpUIIz8JB8B2l/zA3QGh0HolHEcb1QgR7gnpmBEbjdFI8sqwnBN7Gl8AChQ0iiuV8ETNSHfAsh5sDO/z X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(10436049006162)(72170088055959)(100405760836317)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6041248)(20161123558100)(20161123555025)(20161123562025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(6072148);SRVR:DM3PR15MB1084;BCL:0;PCL:0;RULEID:;SRVR:DM3PR15MB1084; X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1084;4:7kOUD1mRTatzRLUFHLyp7abMd1uj5S00jCkm/7AXoB3vuUL3RfoqCKNoM9WCFHJNH9W/h6NutMhJ4Xgni44MaTlElYqotXMbQikJaJBgU6DckXnn3mznqfRAr9O2QwjKpQs6seIUI7CUxB3A8yPuiwAChlb+6yPyaOavLI1mZ2x7ZzgSkAh6JyHfvCab6IMhSMPFuTYehhR5PzQKEnm+4vCsa4UtKcm+zgGTTl8UpuozPiClqhqjQccyoI3NvG4DVkV7XQ/vy531u/I4j1CZyG604Pc6qRRjZFYw+54qx/T//5kjsjNLHeY7G7/Puo+TK3GoXMRkHEoeuf49oZJGoKEnnFe03LMSXPT22GqwvbX63TxZn1+a0qs7CQP37JLdplGl7F3gZYm5cv59gZ9FnOdLdDon+HJdvb/nNofmISmfto+fTg2ngo4O0W0vmmBugF1d9bcSUTJ/OJa4C12Ova1YOCIBeEyFUHxXvwlFxdNVNf48XtdOBKwu8OGXUcMr9I0ycZEfnUwfWOE0zE1OxHTopfPadbckJWBHaYCCP4g1CMex9oRE3iKzDVDA5cNeDyeTHozEZFMk2V7D3V21t0xBPfi7djIT1G32bBa6eIaljEEwXVg1ona9F2xpV+0FyoPje3n+3lygL0VVVuikyQIyr0CRmQdXK5W6L73k7Iad8Q+53PvmkkRnv0gCg4Y/q2hieEyZPnveSoTz4InAwSCZ+5tvH0XK6JgvR/6J4MECouj0CaPkkag0C07p5DgbozYlsd+U2iEJtZRKUohvgdgCpNL9pc1UtupX0RXPhoZEfgmpT/enY+2bJsq48V86QZnpuMBDeW/HzeAqemRWsuE1i8zyZGg9iYRjSRRlG4t4uJzpL4mIywOZwGz0u6Tz69bmJGRKxFNr/cwPMvFS2Q+9Jp1BtAkEumoviNzE5dbV7YhbFH6LQkaYvtQgsWD3 X-Forefront-PRVS: 0311124FA9 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(39850400002)(39450400003)(39840400002)(39400400002)(377454003)(377424004)(57704003)(24454002)(5660300001)(42186005)(478600001)(76176999)(81166006)(7416002)(8676002)(54356999)(2906002)(1411001)(33716001)(50986999)(229853002)(2950100002)(305945005)(6116002)(7736002)(6246003)(55016002)(1076002)(33656002)(54906002)(4326008)(23726003)(53936002)(6306002)(110136004)(6666003)(6916009)(575784001)(38730400002)(25786009)(46406003)(6496005)(9686003)(86362001)(53546009)(966005)(97756001)(189998001)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM3PR15MB1084;H:castle;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM3PR15MB1084;23:CM4o+/6R0Xd2dZW+xPIyJiOwOEHAHXjjLy1BMdxXE?= =?us-ascii?Q?bN5/q1oHelKUS9PJSCYZ4dBmwyNnfBXqHVuJF0JxIXQgkr3TCHnuYYlFGc7m?= =?us-ascii?Q?/BKr1y0GmE2cdtK7+Cqrh7ccz6rWNSzcPL8DgEiAPK6n0v+aakdttKKEfnTc?= =?us-ascii?Q?z23tvaZHQP+MTBJhjSKXlAcxIqzchv/8PPrmN8mmbA/0/W4Yj8/IkGY888rB?= =?us-ascii?Q?KXhckLB9FnvMVhqnN3aZ8R/q5ohZ3kijHDWV2aGlTc74xRz0pNiXtbsQJXRB?= =?us-ascii?Q?JDeCgr4vZ9ckSGbjB2QRlt3/soIt2tL4cPfB5k1dd3Sl86wlR5L2cFBdKtq2?= =?us-ascii?Q?VihD70aUi5TlXRgyWaIeQOpV+9FYd5euaW5l5v7qCNjJ2h5zek8it1WweVgY?= =?us-ascii?Q?X4eKgLml7JwOusX8q0vK9usLyxTpY5v/JiFzLAMJcvYUPbA1RL0DMeAop+LS?= =?us-ascii?Q?IVuNTt2MfHbuKYMmlMHqUH8CouTlZaUlJ8FrCPXkaa21xTqhTzCTgy6z90+T?= =?us-ascii?Q?pgOYyiYDZRsRVp2DQyl8ubyzPb7xX5bC3e7qqu78QnbCelO9CtQ8js6H0aKv?= =?us-ascii?Q?z56oeI+HzWNdP1Au6w4cw1mu+6Yie0HpPfcECbfFRw2MGF5LxCsiLGwNE2OL?= =?us-ascii?Q?p4PPRPGQ+B6dfF8Lsi9BBu2Y9uXnJZIuLngGPW4UEldpRFlNf7lTUWQxIF2P?= =?us-ascii?Q?tSiJiH6aGpuI/PQ/+HmGn70hFNpelJDQMJS/taf/g9HNOOzuG/W7ulAvQeco?= =?us-ascii?Q?ayujXWv32Pnj5p37RQnJ5fIfpXhE6w3F9URfRu4qIItQNFeJ8w2mtxLSb/+b?= =?us-ascii?Q?DnQoJgqoFaXUmchf2N/DjYpN27+T+J5mHiNnxCG1s/4/aSDbufju2RTset9f?= =?us-ascii?Q?GNKDWR2cwy2S591FXpSElTCSRHkjksRSfZQyGrCufOyrJw9eE3G4fqIvxqQf?= =?us-ascii?Q?afZDfvI3CNM2IeTqWtPoKbBINa92wV92ul9e6mdPCNfa7rJSueObcXQl3+yf?= =?us-ascii?Q?kWx5VteH6IO9t3lYjwdaS5Oh3+Klyn1TiBR8f8McnIuoaDmqs111khTIymos?= =?us-ascii?Q?cgnP/Ex603ka9VFnYfJ02/uHJ0bqg2xcF3gjwnAPXLEpnqrHUCXXblcx1fha?= =?us-ascii?Q?D2UL4tPpiYrxH9qF8gq21skhyOWFEHmWnA6RImsZBlWwQ8JduAynhqSUjvUa?= =?us-ascii?Q?5K9VBll9anXK+XkxVD/exiVqsSIHTH0VNFaFC5wWhdbsMXbInvE484LVwIOR?= =?us-ascii?Q?5PD2fJ78H2Uer4ci8Rx1EM2B1Bz30NqDxze/aKd?= X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1084;6:pDxQr/O88NwqrtKdnHg3FfkyPhCUxe4ueh/xFI763vnQG223IroNKB0+FDAHhIuTHJQqNm8GcBSnokES70B1QwynSKaj4kcwcA2v8DndtGM7ne3d1LdF72GD5aVcGDgiPpOgz6ZJhOhpQqXsI8kJFXV8YUExRHnwjUQvvl8deIfzeWIyUoq8N+99geUbqGDBQl0m9OHnoeiUAfVhFPZLNCEKJzh8PopcV/PRKfnqjiBrA9dfcCv0GwqjZyb4pPyKb+x/Efra3vBk18tT6p9s/W3doC9I7FDclJkvKhiM+RtIdAseZzzblJhLI3EQSnlpe1eIe7Lui9Ms1CDmvKlfO6nZM1vZ3LvOg+KmWgu0SC0mzB9ySs0wUxz4NmDdUFpKhnjL2zS7mnl9UuVZ0Y4pFAja+2YTkRzb70EQ5HAonOhJd055JWBrqWCgtVT02YvQHJ/hr/c6W21GwzIzGiDdHBOZK/Cqaz5ycYnGOaKNWn52tTV7ooZR0Nobe+YWm4OEYErQc2WhyoHvKj7QvGI6OA==;5:eFS209Q/iGDasRQd+v67UWayeOC5uAqMX5Yw8/wgr3nDPIykj9xfoh9h7C3y19xyNg0gps5qRlASxwxU/Z0oxyNGGvVaDinxe+N4fW96F0KyQ1X+5HedFFce6GO7sZ4Mr1DKYCMvMzH9LiNDsjiO7A==;24:bzo48SlGize2eFV+2QgysHr2YUoqsrghim/JXhQC0yBg4NJCcwyMlLecROZixn1WMB7fZgV2MZW+4iJNmt/6DBwRMC8wegt+je2th3q86D8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1084;7:ChwybF/IkFcplTgIxMKPYD4FvNhatRFiv2O9ycHwmmEWNMM7mijheySTJ51WxsyvfZbkuVB/JmFJCPbylgVloY4oj73iWEM1y6BbYkaTl1AoZFx9JTtoVp9MDXklu9R1hf78pA9FQbEbznJjqL8N1WK8M7Rtc6yNmlnVzoUUiczIYXK3t3W63t5JtGnHrFocUYmTLw6p0esLYG+ZelFkUVMUS2awLYwcOVB7316ym2bFej8+UPeabdXpUyIG8nMSH2rJSIkdDoS8uWcBSrLKi4oYurJSG2cV8ZJjCIHYLBm4mqiyiZ1nCcZ391MpVdea6zSL1075Eozz9OlsB98RiQ==;20:XL3lCvfoEx6aPUR1UwPV16Cn4WpZeM6y4FIuBhJtukhdMlzjyDfVNFBqTLYhcp2j//E6689UewowaUBvJNLcy2RuzqXnO1xepXf215zaL6LB4VkxuvodtF9wqiEhijE9N3qTfus+RXDivOKQb77H9N5owLGFUROLOeggPfhnOqg= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2017 19:21:05.6192 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3PR15MB1084 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-18_04:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 19, 2017 at 04:37:27AM +1000, Balbir Singh wrote: > On Fri, May 19, 2017 at 3:30 AM, Michal Hocko wrote: > > On Thu 18-05-17 17:28:04, Roman Gushchin wrote: > >> Traditionally, the OOM killer is operating on a process level. > >> Under oom conditions, it finds a process with the highest oom score > >> and kills it. > >> > >> This behavior doesn't suit well the system with many running > >> containers. There are two main issues: > >> > >> 1) There is no fairness between containers. A small container with > >> a few large processes will be chosen over a large one with huge > >> number of small processes. > >> > >> 2) Containers often do not expect that some random process inside > >> will be killed. So, in general, a much safer behavior is > >> to kill the whole cgroup. Traditionally, this was implemented > >> in userspace, but doing it in the kernel has some advantages, > >> especially in a case of a system-wide OOM. > >> > >> To address these issues, cgroup-aware OOM killer is introduced. > >> Under OOM conditions, it looks for a memcg with highest oom score, > >> and kills all processes inside. > >> > >> Memcg oom score is calculated as a size of active and inactive > >> anon LRU lists, unevictable LRU list and swap size. > >> > >> For a cgroup-wide OOM, only cgroups belonging to the subtree of > >> the OOMing cgroup are considered. > > > > While this might make sense for some workloads/setups it is not a > > generally acceptable policy IMHO. We have discussed that different OOM > > policies might be interesting few years back at LSFMM but there was no > > real consensus on how to do that. One possibility was to allow bpf like > > mechanisms. Could you explore that path? > > I agree, I think it needs more thought. I wonder if the real issue is something > else. For example > > 1. Did we overcommit a particular container too much? Imagine, you have a machine with multiple containers, each with it's own process tree, and the machine is overcommited, i.e. sum of container's memory limits is larger the amount available RAM. In a case of a system-wide OOM some random container will be affected. Historically, this problem was solving by some user-space daemon, which was monitoring OOM events and cleaning up affected containers. But this approach can't solve the main problem: non-optimal selection of a victim. > 2. Do we need something like https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_604212_&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=9jV4id5lmsjFJj1kQjJk0auyQ3bzL27-f6Ur6ZNw36c&s=ElsS25CoZSPba6ke7O-EIsR7lN0psP6tDVyLnGqCMfs&e= to solve > the problem? I don't think it's related. > 3. We have oom notifiers now, could those be used (assuming you are interested > in non memcg related OOM's affecting a container They can be used to inform an userspace daemon about an already happened OOM, but they do not affect victim selection. > 4. How do we determine limits for these containers? From a fariness > perspective Limits are usually set from some high-level understanding of the nature of tasks which are working inside, but overcommiting the machine is a common place, I assume. Thank you! Roman