From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932137AbbI2CDh (ORCPT ); Mon, 28 Sep 2015 22:03:37 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:35437 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754311AbbI2CDg (ORCPT ); Mon, 28 Sep 2015 22:03:36 -0400 Message-ID: <1443492214.3201.34.camel@gmail.com> Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings From: Mike Galbraith To: Kirill Tkhai Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar Date: Tue, 29 Sep 2015 04:03:34 +0200 In-Reply-To: <560992AC.7020909@odin.com> References: <56058A3F.5060408@odin.com> <1443281111.3521.30.camel@gmail.com> <56091651.6070607@odin.com> <1443445947.3529.48.camel@gmail.com> <56095E7C.7080300@odin.com> <1443464557.2780.72.camel@gmail.com> <560992AC.7020909@odin.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2015-09-28 at 22:19 +0300, Kirill Tkhai wrote: > >> Imagine a situation, when we share a mutex > >> with a task on another NUMA node. When the task is realising the mutex > >> it is waking us, but we definitelly won't use affine logic in this case. > > > > Why not? A wakeup is a wakeup is a wakeup, they all do the same thing. > > If wake_wide() doesn't NAK an affine wakeup, we ask wake_affine() for > > its opinion, then look for an idle CPU near the waker's CPU if it says > > OK, or near wakee's previous CPU if it says go away. > > But NUMA sd does not have SD_WAKE_AFFINE flag, so this case a new cpu won't > be choosen from previous node. There will be choosen the highest domain > of smp_processor_id(), which has SD_BALANCE_WAKE flag, and the cpu will > be choosen from the idlest group/cpu. And we don't have a deal with old > cache at all. This looks like a completely wrong behaviour... SD_WAKE_AFFINE is enabled globally by default, and SD_BALANCE_WAKE is disabled globally due to cost and whatnot. wingenfelder:~/:[0]# tune-sched-domains {cpu0/domain0:SMT} SD flag: 4783 + 1: SD_LOAD_BALANCE: Do load balancing on this domain + 2: SD_BALANCE_NEWIDLE: Balance when about to become idle + 4: SD_BALANCE_EXEC: Balance on exec + 8: SD_BALANCE_FORK: Balance on fork, clone - 16: SD_BALANCE_WAKE: Wake to idle CPU on task wakeup + 32: SD_WAKE_AFFINE: Wake task to waking CPU - 64: [unused] + 128: SD_SHARE_CPUCAPACITY: Domain members share cpu power - 256: SD_SHARE_POWERDOMAIN: Domain members share power domain + 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources -1024: SD_SERIALIZE: Only a single load balancing instance -2048: SD_ASYM_PACKING: Place busy groups earlier in the domain +4096: SD_PREFER_SIBLING: Prefer to place tasks in a sibling domain -8192: SD_OVERLAP: sched_domains of this level overlap -16384: SD_NUMA: cross-node balancing {cpu0/domain1:MC} SD flag: 4655 + 1: SD_LOAD_BALANCE: Do load balancing on this domain + 2: SD_BALANCE_NEWIDLE: Balance when about to become idle + 4: SD_BALANCE_EXEC: Balance on exec + 8: SD_BALANCE_FORK: Balance on fork, clone - 16: SD_BALANCE_WAKE: Wake to idle CPU on task wakeup + 32: SD_WAKE_AFFINE: Wake task to waking CPU - 64: [unused] - 128: SD_SHARE_CPUCAPACITY: Domain members share cpu power - 256: SD_SHARE_POWERDOMAIN: Domain members share power domain + 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources -1024: SD_SERIALIZE: Only a single load balancing instance -2048: SD_ASYM_PACKING: Place busy groups earlier in the domain +4096: SD_PREFER_SIBLING: Prefer to place tasks in a sibling domain -8192: SD_OVERLAP: sched_domains of this level overlap -16384: SD_NUMA: cross-node balancing {cpu0/domain2:NUMA} SD flag: 25647 + 1: SD_LOAD_BALANCE: Do load balancing on this domain + 2: SD_BALANCE_NEWIDLE: Balance when about to become idle + 4: SD_BALANCE_EXEC: Balance on exec + 8: SD_BALANCE_FORK: Balance on fork, clone - 16: SD_BALANCE_WAKE: Wake to idle CPU on task wakeup + 32: SD_WAKE_AFFINE: Wake task to waking CPU - 64: [unused] - 128: SD_SHARE_CPUCAPACITY: Domain members share cpu power - 256: SD_SHARE_POWERDOMAIN: Domain members share power domain - 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources +1024: SD_SERIALIZE: Only a single load balancing instance -2048: SD_ASYM_PACKING: Place busy groups earlier in the domain -4096: SD_PREFER_SIBLING: Prefer to place tasks in a sibling domain +8192: SD_OVERLAP: sched_domains of this level overlap +16384: SD_NUMA: cross-node balancing -Mike