From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934090AbcKKQ6i (ORCPT ); Fri, 11 Nov 2016 11:58:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38546 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756659AbcKKQ6f (ORCPT ); Fri, 11 Nov 2016 11:58:35 -0500 Date: Fri, 11 Nov 2016 17:57:43 +0100 From: Oleg Nesterov To: Peter Zijlstra Cc: Ingo Molnar , Linus Torvalds , Mike Galbraith , hartsjc@redhat.com, vbendel@redhat.com, vlovejoy@redhat.com, linux-kernel@vger.kernel.org Subject: Re: sched/autogroup: race if !sysctl_sched_autogroup_enabled ? Message-ID: <20161111165743.GA29869@redhat.com> References: <20161109165933.GA26071@redhat.com> <20161109175005.GS3142@twins.programming.kicks-ass.net> <20161110130913.GA11933@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161110130913.GA11933@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 11 Nov 2016 16:57:37 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/10, Oleg Nesterov wrote: > > And the 3rd case which I didn't think about yesterday. And now I really hope > it can explain the vmcore we have. > > If sysctl_sched_autogroup_enabled was enabled and then disabled, it is > possible that the "autogrouped" process runs with ag->kref.refcount == 1, > and if it does setsid() it frees its active task_group. And yet another problem ;) The exiting thread must call sched_move_task() somewhere before exit_notify() or it can run with the freed task_group() after that. And this means that the no-longer-needed PF_EXITING check in task_wants_autogroup() will be needed again. Simple, but needs the comments/changelog... > So I am going to send the patch which simply moves the sysctl check from > autogroup_move_group() to sched_autogroup_create_attach(), but perhaps I > should split this change? > > I mean, the first patch for -stable could just remove the current check, > the 2nd one will add it into sched_autogroup_create_attach(). No, this is not enough, see above. I am starting to think that we should just move ->autogroup from signal_struct to task_struct. This will simplify the code and fix all these problems. But I need a simple fix for backporting anyway. Oleg.