From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754820AbdCPQeN (ORCPT ); Thu, 16 Mar 2017 12:34:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55112 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752896AbdCPQeK (ORCPT ); Thu, 16 Mar 2017 12:34:10 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A12F64E357 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=oleg@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A12F64E357 Date: Thu, 16 Mar 2017 17:31:59 +0100 From: Oleg Nesterov To: Tejun Heo Cc: Linus Torvalds , Andrew Morton , Peter Zijlstra , Thomas Gleixner , Chris Mason , linux-kernel@vger.kernel.org, kernel-team@fb.com, Li Zefan , Johannes Weiner , cgroups@vger.kernel.org Subject: Re: [PATCH 2/2] kthread, cgroup: close race window where new kthreads can be migrated to non-root cgroups Message-ID: <20170316163158.GB27613@redhat.com> References: <20170315231827.GA13656@htj.duckdns.org> <20170315231920.GB13656@htj.duckdns.org> <20170316150233.GB24478@redhat.com> <20170316153925.GA26391@redhat.com> <20170316160734.GD15810@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170316160734.GD15810@htj.duckdns.org> User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 16 Mar 2017 16:33:57 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/16, Tejun Heo wrote: > > > --- x/kernel/kthread.c > > +++ x/kernel/kthread.c > > @@ -226,6 +226,7 @@ > > ret = -EINTR; > > if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { > > __kthread_parkme(self); > > + current->flags &= ~PF_IDONTLIKECGROUPS; > > ret = threadfn(data); > > } > > do_exit(ret); > > @@ -537,7 +538,7 @@ > > set_cpus_allowed_ptr(tsk, cpu_all_mask); > > set_mems_allowed(node_states[N_MEMORY]); > > > > - current->flags |= PF_NOFREEZE; > > + current->flags |= (PF_NOFREEZE | PF_IDONTLIKECGROUPS); > > > > for (;;) { > > set_current_state(TASK_INTERRUPTIBLE); > > --- x/kernel/cgroup/cgroup.c > > +++ x/kernel/cgroup/cgroup.c > > @@ -2429,7 +2429,7 @@ > > * trapped in a cpuset, or RT worker may be born in a cgroup > > * with no rt_runtime allocated. Just say no. > > */ > > - if (tsk == kthreadd_task || (tsk->flags & PF_NO_SETAFFINITY)) { > > + if (tsk->flags & (PF_NO_SETAFFINITY | PF_IDONTLIKECGROUPS)) { > > ret = -EINVAL; > > goto out_unlock_rcu; > > } > > Absolutely. If we're willing to spend a PF flag on it, we can > properly wait for it too instead of failing it. Or we can add another "unsigned no_cgroups:1" bit into task_struct, not sure. Anyway, I do not understand the PF_NO_SETAFFINITY check in __cgroup_procs_write(). task_can_attach() checks it too, so cgroups can't change the affinity. Imo something explicit like no_cgroups makes more sense. Oleg.