From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754055AbbJBNzR (ORCPT ); Fri, 2 Oct 2015 09:55:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53203 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752613AbbJBNzN (ORCPT ); Fri, 2 Oct 2015 09:55:13 -0400 Date: Fri, 2 Oct 2015 15:52:01 +0200 From: Oleg Nesterov To: Tetsuo Handa Cc: mhocko@kernel.org, akpm@linux-foundation.org, rientjes@google.com, kwalker@redhat.com, skozina@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process() Message-ID: <20151002135201.GA28533@redhat.com> References: <20151001150010.GB6781@redhat.com> <20151001152739.GJ24077@dhcp22.suse.cz> <20151001154115.GA10342@redhat.com> <20151001161916.GK24077@dhcp22.suse.cz> <20151001175319.GA16313@redhat.com> <201510022032.IFC65105.JFtMOQOVSHFLOF@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201510022032.IFC65105.JFtMOQOVSHFLOF@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tetsuo, sorry, I don't understand your question... On 10/02, Tetsuo Handa wrote: > > Oleg Nesterov wrote: > > On 10/01, Michal Hocko wrote: > > > > > > zap_process will add SIGKILL to all threads but the > > > current which will go on without being killed and if this is not a > > > thread group leader then we would miss it. > > > > Yes. And note that de_thread() does the same. Speaking of oom-killer > > this is mostly fine, the execing thread is going to release its old > > ->mm and it has already passed the copy_strings() stage which can use > > a lot more memory. > > So, we have the same wrong fatal_signal_pending() check in out_of_memory() Yes, sure, it is not right too. Again, this is even documented in d003f371b27016354c: fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it. This is off-topic in a sense that this series only tries to ensure that if we are going to kill a memory hog we can't miss a process which shares the same mm (ignoring the OOM_SCORE_ADJ_MIN condition below). > /* > * If current has a pending SIGKILL or is exiting, then automatically > * select it. The goal is to allow it to allocate so that it may > * quickly exit and free its memory. > * > * But don't select if current has already released its mm and cleared > * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur. > */ > if (current->mm && > (fatal_signal_pending(current) || task_will_free_mem(current))) { > mark_oom_victim(current); > return true; > } > > because it is possible that T starts the coredump, T sends SIGKILL to P, > P calls out_of_memory() on GFP_FS allocation, yes, and since fatal_signal_pending() == T we do not even check task_will_free_mem(). > P misses to set SIGKILL on T? > > Since T sends SIGKILL to all clone(CLONE_VM) tasks upon coredump, P needs > to do > > [...snip...] > after mark_oom_victim(current) in case T is not in the same thread group? I do not see how this depends on "not in the same thread group". This fatal_signal_pending() doesn't look right in any case. > If yes, what happens if some task failed to receive SIGKILL due to > p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN condition? Oh. This is another issue. I already tried to suggest to remove this check. But this needs more discussion, hopefully we can do this later. Oleg.