From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB05FC61CE8 for ; Sat, 19 Jan 2019 07:12:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AA38B2084C for ; Sat, 19 Jan 2019 07:12:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547881948; bh=h1V7YtdLyyVFVg3jgiMip01Usk708NbgreTQy5I5sDU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=r0x5fXg+mlBVqmEHxDafHHSHDACtNFHUFWGH7HLysd/Cst+jwNPmjyPINzFgHB/l4 dUQRkP5XYVa2p+GfG6o8YzLFRrsHEaXwWTV1n4f2FNqShv21k0hd7H26PzVyEl6l86 k5Lj6RuoOt1fJ1zTVCcFSXpB7MFlTDeWPsCK0la8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727564AbfASHJi (ORCPT ); Sat, 19 Jan 2019 02:09:38 -0500 Received: from mx2.suse.de ([195.135.220.15]:36900 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727496AbfASHJh (ORCPT ); Sat, 19 Jan 2019 02:09:37 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B9E43AF45; Sat, 19 Jan 2019 07:09:35 +0000 (UTC) Date: Sat, 19 Jan 2019 08:09:34 +0100 From: Michal Hocko To: Shakeel Butt Cc: Johannes Weiner , David Rientjes , Andrew Morton , Tetsuo Handa , Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] mm, oom: fix use-after-free in oom_kill_process Message-ID: <20190119070934.GD4087@dhcp22.suse.cz> References: <20190119005022.61321-1-shakeelb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190119005022.61321-1-shakeelb@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 18-01-19 16:50:22, Shakeel Butt wrote: [...] > On looking further it seems like the process selected to be oom-killed > has exited even before reaching read_lock(&tasklist_lock) in > oom_kill_process(). More specifically the tsk->usage is 1 which is due > to get_task_struct() in oom_evaluate_task() and the put_task_struct > within for_each_thread() frees the tsk and for_each_thread() tries to > access the tsk. The easiest fix is to do get/put across the > for_each_thread() on the selected task. Very well spotted! The code seems safe because we are careful to transfer the victim along with reference counting but I've totally missed that the loop itself needs a reference. It seems that this has been broken since the heuristic has been introduced. But I haven't checked it closely. I am still on vacation. > Now the next question is should we continue with the oom-kill as the > previously selected task has exited? However before adding more > complexity and heuristics, let's answer why we even look at the > children of oom-kill selected task? The objective was the work protection assuming that children did less work than their parrent. I find this argument a bit questionable because it highly depends a specific workload while it opens doors for problematic behavior at the same time. If you have a fork bomb like workload then it is basically hard to resolve the OOM condition as children have barely any memory so we keep looping killing tasks which will not free up much. So I am all for removing this heuristic. > The select_bad_process() has already > selected the worst process in the system/memcg. Due to race, the > selected process might not be the worst at the kill time but does that > matter matter? No, we don't I believe. The aim of the oom killer is to kill something. We will never be ideal here because this is a land of races. > The userspace can play with oom_score_adj to prefer > children to be killed before the parent. I looked at the history but it > seems like this is there before git history. > > Signed-off-by: Shakeel Butt Fixes: 5e9d834a0e0c ("oom: sacrifice child with highest badness score for parent") Cc: stable Acked-by: Michal Hocko Thanks! > --- > mm/oom_kill.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 0930b4365be7..1a007dae1e8f 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -981,6 +981,13 @@ static void oom_kill_process(struct oom_control *oc, const char *message) > * still freeing memory. > */ > read_lock(&tasklist_lock); > + > + /* > + * The task 'p' might have already exited before reaching here. The > + * put_task_struct() will free task_struct 'p' while the loop still try > + * to access the field of 'p', so, get an extra reference. > + */ > + get_task_struct(p); > for_each_thread(p, t) { > list_for_each_entry(child, &t->children, sibling) { > unsigned int child_points; > @@ -1000,6 +1007,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) > } > } > } > + put_task_struct(p); > read_unlock(&tasklist_lock); > > /* > -- > 2.20.1.321.g9e740568ce-goog -- Michal Hocko SUSE Labs