From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755043AbbIRQ3a (ORCPT <rfc822;w@1wt.eu>);
	Fri, 18 Sep 2015 12:29:30 -0400
Received: from mx1.redhat.com ([209.132.183.28]:48743 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932117AbbIRQ1W (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 18 Sep 2015 12:27:22 -0400
Date: Fri, 18 Sep 2015 18:24:23 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Christoph Lameter <cl@linux.com>
Cc: Kyle Walker <kwalker@redhat.com>, akpm@linux-foundation.org,
        mhocko@suse.cz, rientjes@google.com, hannes@cmpxchg.org,
        vdavydov@parallels.com, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org,
        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
        Stanislav Kozina <skozina@redhat.com>
Subject: Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks
Message-ID: <20150918162423.GA18136@redhat.com>
References: <1442512783-14719-1-git-send-email-kwalker@redhat.com> <20150917192204.GA2728@redhat.com> <alpine.DEB.2.11.1509181035180.11189@east.gentwo.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.11.1509181035180.11189@east.gentwo.org>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 09/18, Christoph Lameter wrote:
>
> > But yes, such a deadlock is possible. I would really like to see the comments
> > from maintainers. In particular, I seem to recall that someone suggested to
> > try to kill another !TIF_MEMDIE process after timeout, perhaps this is what
> > we should actually do...
>
> Well yes here is a patch that kills another memdie process but there is
> some risk with such an approach of overusing the reserves.

Yes, I understand it is not that simple. And probably this is all I can
understand ;)

> --- linux.orig/mm/oom_kill.c	2015-09-18 10:38:29.601963726 -0500
> +++ linux/mm/oom_kill.c	2015-09-18 10:39:55.911699017 -0500
> @@ -265,8 +265,8 @@ enum oom_scan_t oom_scan_process_thread(
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
>  	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
> -		if (oc->order != -1)
> -			return OOM_SCAN_ABORT;
> +		if (unlikely(frozen(task)))
> +			__thaw_task(task);

To simplify the discussion lets ignore PF_FROZEN, this is another issue.

I am not sure this change is enough, we need to ensure that
select_bad_process() won't pick the same task (or its sub-thread) again.

And perhaps something like

	wait_event_timeout(oom_victims_wait, !oom_victims,
				configurable_timeout);

before select_bad_process() makes sense?

Oleg.