From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1426315AbcFHOql (ORCPT ); Wed, 8 Jun 2016 10:46:41 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:34781 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424687AbcFHOqj (ORCPT ); Wed, 8 Jun 2016 10:46:39 -0400 Date: Wed, 8 Jun 2016 16:46:36 +0200 From: Michal Hocko To: Vladimir Davydov Cc: Andrew Morton , Tetsuo Handa , David Rientjes , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] mm: oom: deduplicate victim selection code for memcg and global oom Message-ID: <20160608144636.GN22570@dhcp22.suse.cz> References: <40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com> <3bbc7b70dae6ace0b8751e0140e878acfdfffd74.1464358556.git.vdavydov@virtuozzo.com> <20160608083334.GF22570@dhcp22.suse.cz> <20160608135204.GA30465@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160608135204.GA30465@esperanza> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 08-06-16 16:52:04, Vladimir Davydov wrote: > On Wed, Jun 08, 2016 at 10:33:34AM +0200, Michal Hocko wrote: > > On Fri 27-05-16 17:17:42, Vladimir Davydov wrote: > > [...] > > > @@ -970,26 +1028,25 @@ bool out_of_memory(struct oom_control *oc) > > > !oom_unkillable_task(current, NULL, oc->nodemask) && > > > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { > > > get_task_struct(current); > > > - oom_kill_process(oc, current, 0, totalpages, > > > - "Out of memory (oom_kill_allocating_task)"); > > > + oom_kill_process(oc, current, 0, totalpages); > > > return true; > > > } > > > > Do we really want to introduce sysctl_oom_kill_allocating_task to memcg > > as well? > > Not sure, but why not? We take into account dump_tasks and panic_on_oom > on memcg oom so why should we treat this sysctl differently? Well, for one thing nobody has requested that and it would be a user visible change which might be unexpected. And as already said I think it was a mistake to introduce this sysctl in the first place. The behavior is so random that I am even not sure it is usable in the real life. Spreading it more doesn't sound like a good idea to me. [...] > > Now if you look at out_of_memory() the only shared "heuristic" with the > > memcg part is the bypass for the exiting tasks. > > bypass exiting task (task_will_free_mem) > check for panic (check_panic_on_oom) > oom badness evaluation (oom_scan_process_thread or oom_evaluate_task > after your patch) > points calculation + kill (oom_kill_process) > > And if you need to modify any of these function calls or add yet another > check, you have to do it twice. Ugly. Ideally all those changes would happen inside those helpers. Also if you look at out_of_memory and mem_cgroup_out_of_memory it is much easier to follow the later one because it doesn't have that different combinations of heuristic which only make sense for sysrq or global oom. > > Plus both need the oom_lock. > > I believe locking could be unified for global/memcg oom cases too. > > > You have to special case oom notifiers, panic on no victim handling and > > I guess the oom_kill_allocating task is not intentional either. So I > > am not really sure this is an improvement. I even hate how we conflate > > sysrq vs. regular global oom context together but my cleanup for that > > has failed in the past. > > > > The victim selection code can be reduced because it is basically > > shared between the two, only the iterator differs. But I guess that > > can be eliminated by a simple helper. > > IMHO exporting a bunch of very oom-specific helpers (like those I > enumerated above), partially revealing oom implementation, instead of > well defined memcg helpers that could be reused anywhere else looks > ugly. It's like having shrink_zone implementation both in vmscan.c and > memcontrol.c with shrink_slab, shrink_lruvec, etc. exported, because we > need to iterate over cgroups there. I agree that the API for OOM killer parts is not really great. I am just little bit afraid that iterators are just over engineered. I am even not sure whethers those have any other potential users. The diffstat of the cleanup I have here right now sounds really encouranging. --- include/linux/oom.h | 17 ++++------- mm/memcontrol.c | 48 +++-------------------------- mm/oom_kill.c | 87 ++++++++++++++++++++++++++++++----------------------- 3 files changed, 60 insertions(+), 92 deletions(-) compared to yours include/linux/memcontrol.h | 15 ++++ include/linux/oom.h | 51 ------------- mm/memcontrol.c | 112 ++++++++++----------------- mm/oom_kill.c | 183 +++++++++++++++++++++++++++++---------------- 4 files changed, 176 insertions(+), 185 deletions(-) we save more LOC with a smaller patch. I know this is not an absolute metric but I would rather go with simplicity than an elaborate APIs. This is all pretty much mm/memcg internal. Anyway I do not have strong opinion and will not insist. I can post the full cleanup with suggestions from Tetsuo integrated if you are interested. -- Michal Hocko SUSE Labs