From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754919AbcDDJiu (ORCPT ); Mon, 4 Apr 2016 05:38:50 -0400 Received: from mail-lf0-f65.google.com ([209.85.215.65]:33909 "EHLO mail-lf0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750810AbcDDJit (ORCPT ); Mon, 4 Apr 2016 05:38:49 -0400 Date: Mon, 4 Apr 2016 11:38:44 +0200 From: Ingo Molnar To: Jiri Olsa Cc: Peter Zijlstra , James Hartsock , Rik van Riel , Srivatsa Vaddagiri , Kirill Tkhai , linux-kernel@vger.kernel.org Subject: Re: [RFC] sched: unused cpu in affine workload Message-ID: <20160404093844.GA16017@gmail.com> References: <20160404082302.GB2137@krava.local> <20160404085944.GA3030@gmail.com> <20160404091951.GA10360@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160404091951.GA10360@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > So my thinking here is: if the NUMA balancing code (which is node granular at > the moment and uses node masks, etc.) is extended to be CPU granular (which is a > big task in itself), then the two problems can be 'unified': > > - the NUMA balancing code inputs arbitrarly CPU (node) affinity masks from the > MM code into the scheduler. > > - the scheduler syscall ABI (and other configuration sources) inputs arbitrary > CPU affinity masks into the scheduler. > > it's a similar problem, with two (minor looking) complication: btw., this highlights how hard the optimization problem is: the NUMA balancing code is (at least ...) O(nr_nodes^2) complex - but we had O(nr_nodes^3) passes too in some of the NUMA balancing submissions... We'd upgrade that to O(nr_cpus^2), which is totally unrealistic with 16,000 CPUs even in a slowpath - but it would probably cause problems even with 120 CPUs. It will get quadratically worse as the number of CPUs in a system increases on its current exponential trajectory ... So the safest bet would be to restrict any 'perfect' balancing attempts to node boundaries. Which won't solve the problem you outlined to begin with. Thanks, Ingo