From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id 4FB176B025F for ; Thu, 14 Jul 2016 09:23:04 -0400 (EDT) Received: by mail-qk0-f197.google.com with SMTP id a123so160993514qkd.2 for ; Thu, 14 Jul 2016 06:23:04 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id i139si1099703yba.223.2016.07.14.06.23.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Jul 2016 06:23:03 -0700 (PDT) Date: Thu, 14 Jul 2016 15:22:59 +0200 From: Oleg Nesterov Subject: Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup Message-ID: <20160714132258.GA1333@redhat.com> References: <20160711064150.GB5284@dhcp22.suse.cz> <20160712071927.GD14586@dhcp22.suse.cz> <57851224.2020902@yandex-team.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Shayan Pooya Cc: Konstantin Khlebnikov , Michal Hocko , koct9i@gmail.com, cgroups mailinglist , LKML , linux-mm@kvack.org On 07/12, Shayan Pooya wrote: > > > Yep. Bug still not fixed in upstream. In our kernel I've plugged it with > > this: > > > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2808,8 +2808,9 @@ asmlinkage __visible void schedule_tail(struct > > task_struct *prev) > > balance_callback(rq); > > preempt_enable(); > > > > - if (current->set_child_tid) > > - put_user(task_pid_vnr(current), current->set_child_tid); > > + if (current->set_child_tid && > > + put_user(task_pid_vnr(current), current->set_child_tid)) > > + force_sig(SIGSEGV, current); > > } > > I just verified that with your patch there is no hung processes and I > see processes getting SIGSEGV as expected. Well, but we can't do this. And "as expected" is actually just wrong. I still think that the whole FAULT_FLAG_USER logic is not right. This needs another email. fork() should not fail because there is a memory hog in the same memcg. Worse, pthread_create() can kill the caller by the same reason. And we have the same or even worse problem with ->clear_child_tid, pthread_join() can hang forever. Unlikely we want to kill the application in this case ;) And in fact I think that the problem has nothing to do with set/claer_child_tid in particular. I am just curious... can you reproduce the problem reliably? If yes, can you try the patch below ? Just in case, this is not the real fix in any case... Oleg. --- x/kernel/sched/core.c +++ x/kernel/sched/core.c @@ -2793,8 +2793,11 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) balance_callback(rq); preempt_enable(); - if (current->set_child_tid) + if (current->set_child_tid) { + mem_cgroup_oom_enable(); put_user(task_pid_vnr(current), current->set_child_tid); + mem_cgroup_oom_disable(); + } } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org