From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752411Ab3LBPhv (ORCPT ); Mon, 2 Dec 2013 10:37:51 -0500 Received: from mail-bk0-f47.google.com ([209.85.214.47]:58524 "EHLO mail-bk0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751819Ab3LBPhu (ORCPT ); Mon, 2 Dec 2013 10:37:50 -0500 Date: Mon, 2 Dec 2013 16:37:46 +0100 From: Ingo Molnar To: Peter Zijlstra Cc: Tejun Heo , Linus Torvalds , Thomas Gleixner , linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [PATCH] sched: Revert 14a40ffccd61 ("sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY") Message-ID: <20131202153746.GA27781@gmail.com> References: <20131202152956.GI16796@laptop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131202152956.GI16796@laptop.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Peter Zijlstra wrote: > Subject: sched: Revert 14a40ffccd61 ("sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY") > > PF_NO_SETAFFINITY, which crudely turns off affinity control and cgroups > access to tasks, and which is in use by every workqueue thread in Linux (!), > is conceptually wrong on many levels: > > - We should strive to never consciously place artificial limitations on > kernel functionality; our main reason to place limitations should be > correctness. > > There are cases where limiting affinity is justified: for example the > case of single cpu workqueue threads, which are special for their > limited concurrency, esp. when coupled with per-cpu resources -- > allowing such threads to run on other cpus is a correctness violation > and can crash the kernel. > > - But using it outside this case is overly broad; it dis-allows usage > that is functionally fine and in some cases desired. > > In particular; tj argues ( http://lkml.kernel.org/r/20131128143848.GD3925@htj.dyndns.org ) > > "That's just inviting people to do weirdest things and then > reporting things like "crypt jobs on some of our 500 machines end up > stuck on a single cpu once in a while" which will eventually be > tracked down to some weird shell script setting affinity on workers > doing something else." > > While that is exactly what HPC/RT people _want_ in order to avoid > disturbing the other CPUs with said crypt work. > > - Furthermore, the above example is also wrong in that you should not > protect root from itself; there's plenty root can do to shoot his > own foot off, let alone shoot his head off. > > Setting affinities is limited to root, and if root messes up the > system he can keep the pieces. But limiting in an overly broad > fashion stifles the functionality of the system. > > - Lastly; the flag actually blocks entry into cgroups as well as > sched_setaffinity(), so the name is misleading at best. > > The right fix is to only set PF_THREAD_BOUND on per-cpu workers: > > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > > set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > > - /* prevent userland from meddling with cpumask of workqueue workers */ > - worker->task->flags |= PF_NO_SETAFFINITY; > - > /* > * The caller is responsible for ensuring %POOL_DISASSOCIATED > * remains stable across this function. See the comments above the > * flag definition for details. > */ > - if (pool->flags & POOL_DISASSOCIATED) > + if (pool->flags & POOL_DISASSOCIATED) { > worker->flags |= WORKER_UNBOUND; > + } else { > + /* > + * Prevent userland from meddling with cpumask of workqueue > + * workers: > + */ > + worker->task->flags |= PF_THREAD_BOUND; > + } > > Therefore revert the patch and add an annoying but non-destructive warning > check against abuse. Hm, I missed these problems with the approach, but I think you are right. Tejun, I suspect you concur with Peter's analysis, can I also add Peter's workqueue.c fixlet above to workqueue.c to this patch plus your Acked-by, so that the two changes are together? Thanks, Ingo