From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f197.google.com (mail-ig0-f197.google.com [209.85.213.197]) by kanga.kvack.org (Postfix) with ESMTP id 2DFDA6B007E for ; Wed, 8 Jun 2016 09:52:21 -0400 (EDT) Received: by mail-ig0-f197.google.com with SMTP id lp2so14511406igb.3 for ; Wed, 08 Jun 2016 06:52:21 -0700 (PDT) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0121.outbound.protection.outlook.com. [104.47.2.121]) by mx.google.com with ESMTPS id y71si900281oia.24.2016.06.08.06.52.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Jun 2016 06:52:20 -0700 (PDT) Date: Wed, 8 Jun 2016 16:52:04 +0300 From: Vladimir Davydov Subject: Re: [PATCH 2/2] mm: oom: deduplicate victim selection code for memcg and global oom Message-ID: <20160608135204.GA30465@esperanza> References: <40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com> <3bbc7b70dae6ace0b8751e0140e878acfdfffd74.1464358556.git.vdavydov@virtuozzo.com> <20160608083334.GF22570@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160608083334.GF22570@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Andrew Morton , Tetsuo Handa , David Rientjes , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Jun 08, 2016 at 10:33:34AM +0200, Michal Hocko wrote: > On Fri 27-05-16 17:17:42, Vladimir Davydov wrote: > [...] > > @@ -970,26 +1028,25 @@ bool out_of_memory(struct oom_control *oc) > > !oom_unkillable_task(current, NULL, oc->nodemask) && > > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { > > get_task_struct(current); > > - oom_kill_process(oc, current, 0, totalpages, > > - "Out of memory (oom_kill_allocating_task)"); > > + oom_kill_process(oc, current, 0, totalpages); > > return true; > > } > > Do we really want to introduce sysctl_oom_kill_allocating_task to memcg > as well? Not sure, but why not? We take into account dump_tasks and panic_on_oom on memcg oom so why should we treat this sysctl differently? > The heuristic is quite dubious even for the global context IMHO > because it leads to a very random behavior. > > > p = select_bad_process(oc, &points, totalpages); > > /* Found nothing?!?! Either we hang forever, or we panic. */ > > - if (!p && !is_sysrq_oom(oc)) { > > + if (!p && !is_sysrq_oom(oc) && !oc->memcg) { > > dump_header(oc, NULL); > > panic("Out of memory and no killable processes...\n"); > > } > > if (p && p != (void *)-1UL) { > > - oom_kill_process(oc, p, points, totalpages, "Out of memory"); > > + oom_kill_process(oc, p, points, totalpages); > > /* > > * Give the killed process a good chance to exit before trying > > * to allocate memory again. > > */ > > schedule_timeout_killable(1); > > } > > - return true; > > + return !!p; > > } > > Now if you look at out_of_memory() the only shared "heuristic" with the > memcg part is the bypass for the exiting tasks. bypass exiting task (task_will_free_mem) check for panic (check_panic_on_oom) oom badness evaluation (oom_scan_process_thread or oom_evaluate_task after your patch) points calculation + kill (oom_kill_process) And if you need to modify any of these function calls or add yet another check, you have to do it twice. Ugly. > Plus both need the oom_lock. I believe locking could be unified for global/memcg oom cases too. > You have to special case oom notifiers, panic on no victim handling and > I guess the oom_kill_allocating task is not intentional either. So I > am not really sure this is an improvement. I even hate how we conflate > sysrq vs. regular global oom context together but my cleanup for that > has failed in the past. > > The victim selection code can be reduced because it is basically > shared between the two, only the iterator differs. But I guess that > can be eliminated by a simple helper. IMHO exporting a bunch of very oom-specific helpers (like those I enumerated above), partially revealing oom implementation, instead of well defined memcg helpers that could be reused anywhere else looks ugly. It's like having shrink_zone implementation both in vmscan.c and memcontrol.c with shrink_slab, shrink_lruvec, etc. exported, because we need to iterate over cgroups there. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org