From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754471AbcKIQ71 (ORCPT ); Wed, 9 Nov 2016 11:59:27 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44148 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754127AbcKIQ70 (ORCPT ); Wed, 9 Nov 2016 11:59:26 -0500 Date: Wed, 9 Nov 2016 17:59:33 +0100 From: Oleg Nesterov To: Ingo Molnar , Linus Torvalds , Mike Galbraith , Peter Zijlstra Cc: hartsjc@redhat.com, vbendel@redhat.com, vlovejoy@redhat.com, linux-kernel@vger.kernel.org Subject: sched/autogroup: race if !sysctl_sched_autogroup_enabled ? Message-ID: <20161109165933.GA26071@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 09 Nov 2016 16:59:25 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I am trying to investigate a bug-report which looks as an autogroup bug, and it seems I found the race which _might_ explain the problem. I'll try to make the fix tomorrow but could you confirm I got it right and answer the question below? Let's look at task_wants_autogroup() /* * We can only assume the task group can't go away on us if * autogroup_move_group() can see us on ->thread_group list. */ if (p->flags & PF_EXITING) return false; Firstly, I think that this PF_EXITING check is no longer needed. sched_change_group() can be called by autogroup or cgroups code. autogroup is obviously fine, cgroup-attach will never try to migrate the exiting task (cgroup_threadgroup_rwsem helps), cpu_cgroup_fork() is obviously fine too. But!!! at the same time the comment is _correct_ even if very cryptic ;) We need to ensure that autogroup/tg returned by autogroup_task_group() can't go away if we race with autogroup_move_group(), and unless the caller holds ->siglock we rely on fact that autogroup_move_group() will a) see this task and b) do sched_move_task() which needs the same same rq->lock. However. autogroup_move_group() skips for_each_thread/sched_move_task if sysctl_sched_autogroup_enabled == 0. So. Doesn't this mean that cgroup migration to the root cgroup can race with autogroup_move_group() and use the soon-to-be-freed autogroup->tg? Or, even simpler, cgroup_post_fork()->cpu_cgroup_fork() can hit the same race if CLONE_TRHEAD? Or I am totally confused? --------------------------------------------------------------------- Now the qustions. Can't we simplify autogroup_task_get() and avoid lock_task_sighand() ? struct autogroup *autogroup_task_get(struct task_struct *p) { struct autogroup *ag rcu_read_lock(); for (;;) { // it is freed by sched_free_group_rcu() path // and thus ->autogroup is rcu-safe too. ag = READ_ONCE(p->signal->autogroup); if (kref_get_unless_zero(&ag->kref)) break; } rcu_read_unlock(); return ag; } although this is a bit off-topic. Another question is that I fail to understand why sched_autogroup_create_attach() does autogroup_create() and changes signal->autogroup even if !sysctl_sched_autogroup_enabled. IOW, even ignoring the problem above, what is wrong with this patch? --- x/kernel/sched/auto_group.c +++ x/kernel/sched/auto_group.c @@ -152,8 +152,12 @@ out: /* Allocates GFP_KERNEL, cannot be called under any spinlock */ void sched_autogroup_create_attach(struct task_struct *p) { - struct autogroup *ag = autogroup_create(); + struct autogroup *ag; + + if (!sysctl_sched_autogroup_enabled) + return; + ag = autogroup_create(); autogroup_move_group(p, ag); /* drop extra reference added by autogroup_create() */ autogroup_kref_put(ag); or even --- x/kernel/sched/auto_group.c +++ x/kernel/sched/auto_group.c @@ -64,9 +64,13 @@ static inline struct autogroup *autogrou static inline struct autogroup *autogroup_create(void) { - struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL); + struct autogroup *ag; struct task_group *tg; + if (!sysctl_sched_autogroup_enabled) + goto xxx; + + ag = kzalloc(sizeof(*ag), GFP_KERNEL); if (!ag) goto out_fail; @@ -103,7 +107,7 @@ out_fail: printk(KERN_WARNING "autogroup_create: %s failure.\n", ag ? "sched_create_group()" : "kmalloc()"); } - +xxx: return autogroup_kref_get(&autogroup_default); } Oleg.