From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753921Ab3JAP1l (ORCPT ); Tue, 1 Oct 2013 11:27:41 -0400 Received: from mail-la0-f51.google.com ([209.85.215.51]:37732 "EHLO mail-la0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751781Ab3JAP1e (ORCPT ); Tue, 1 Oct 2013 11:27:34 -0400 Date: Tue, 1 Oct 2013 19:26:40 +0400 From: Sergey Dyasly To: David Rientjes Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Michal Hocko , Rusty Russell , Sha Zhengju , Oleg Nesterov Subject: Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit Message-Id: <20131001192640.ed55682d3113b00b402bbef5@gmail.com> In-Reply-To: References: <1378740624-2456-1-git-send-email-dserrg@gmail.com> <20130911190605.5528ee4563272dbea1ed56a6@gmail.com> <20130927185833.6c72b77ab105d70d4996ebef@gmail.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.17; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It seems to me that we are going nowhere with this discussion... If you are ok with the first change in my patch regarding fatal_signal_pending, I can send new patch with just that change. On Mon, 30 Sep 2013 15:08:25 -0700 (PDT) David Rientjes wrote: > On Fri, 27 Sep 2013, Sergey Dyasly wrote: > > > What you are saying contradicts current OOMk code the way I read it. Comment in > > oom_kill_process() says: > > > > "If the task is already exiting ... set TIF_MEMDIE so it can die quickly" > > > > I just want to know the right solution. > > > > That's a comment, not code. The point of the PF_EXITING special handling > in oom_kill_process() is to avoid telling sysadmins that a process has > been killed to free memory when it has already called exit() and to avoid > sacrificing one of its children for the exiting process. > > It may or may not need access to memory reserves to actually exit after > PF_EXITING depending on whether it needs to allocate memory for > coredumping or anything else. So instead of waiting for it to recall the > oom killer, TIF_MEMDIE is set anyway. The point is that PF_EXITING > processes can already get TIF_MEMDIE immediately when their memory > allocation fails so there's no reason not to set it now as an > optimization. > > But we definitely want to avoid printing anything to the kernel log when > the process has already called exit() and issuing the SIGKILL at that > point would be pointless. > > > You are mistaken, oom_kill_process() is only called from out_of_memory() > > and mem_cgroup_out_of_memory(). > > > > out_of_memory() calls oom_kill_process() in two places, plus the call from > mem_cgroup_out_of_memory(), making three calls in the tree. Not that this > matters in the slightest, though. > > > > Read the comment about why we don't emit anything to the kernel log in > > > this case; the process is already exiting, there's no need to kill it or > > > make anyone believe that it was killed. > > > > Yes, but there is already the PF_EXITING check in oom_scan_process_thread(), > > and in this case oom_kill_process() won't be even called. That's why it's > > redundant. > > > > You apparently have no idea how long select_bad_process() runs on a large > system with thousands of processes. Keep in mind that SGI requested the > addition of the oom_kill_allocating_task sysctl specifically because of > how long select_bad_process() runs. The PF_EXITING check in > oom_kill_process() is simply an optimization to return early and with > access to memory reserves so it can exit as quickly as possible and > without the kernel stating it's killing something that has already called > exit(). -- Sergey Dyasly