From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753581AbbE1KVk (ORCPT ); Thu, 28 May 2015 06:21:40 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:57746 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751529AbbE1KVc (ORCPT ); Thu, 28 May 2015 06:21:32 -0400 Date: Thu, 28 May 2015 12:21:27 +0200 From: Peter Zijlstra To: Josef Bacik Cc: riel@redhat.com, mingo@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE Message-ID: <20150528102127.GD3644@twins.programming.kicks-ass.net> References: <1432761736-22093-1-git-send-email-jbacik@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1432761736-22093-1-git-send-email-jbacik@fb.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 27, 2015 at 05:22:16PM -0400, Josef Bacik wrote: > > SD_BALANCE_WAKE is supposed to find us an idle cpu to run on, however sd->flags's SD_BALANCE_WAKE, not sd_flags. And note that SD_BALANCE_WAKE is not set on domains by default. > it is just looking for an idle sibling, preferring affinity over all > else. Not sure what you're saying here, what affinity? > This is not helpful in all cases, and SD_BALANCE_WAKE's job is > to find us an idle cpu, not garuntee affinity. Your argument is going backwards, SD_BALANCE_WAKE is not actually set. > Fix this by first > trying to find an idle sibling, and then if the cpu is not idle fall > through to the logic to find an idle cpu. With this patch we get > slightly better performance than with our forward port of > SD_WAKE_IDLE. This is broken. You most certainly do not want to go do that whole load balance pass on wakeups. It should be controlled by sd->flags. It is far too expensive to consider turning that on by default. In fact, select_idle_sibling() is already too expensive on current server hardware (far too damn many cpus in a LLC domain).