From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936217AbXJQCXo (ORCPT ); Tue, 16 Oct 2007 22:23:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1761601AbXJQCXG (ORCPT ); Tue, 16 Oct 2007 22:23:06 -0400 Received: from mga03.intel.com ([143.182.124.21]:39765 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761223AbXJQCXF (ORCPT ); Tue, 16 Oct 2007 22:23:05 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.21,286,1188802800"; d="scan'208";a="299912210" Date: Tue, 16 Oct 2007 19:23:03 -0700 From: "Siddha, Suresh B" To: Ken Chen Cc: Ingo Molnar , Nick Piggin , "Siddha, Suresh B" , Andrew Morton , Linux Kernel Mailing List Subject: Re: [patch] sched: fix improper load balance across sched domain Message-ID: <20071017022303.GA27457@linux-os.sc.intel.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 16, 2007 at 12:07:06PM -0700, Ken Chen wrote: > We recently discovered a nasty performance bug in the kernel CPU load > balancer where we were hit by 50% performance regression. > > When tasks are assigned to a subset of CPUs that span across > sched_domains (either ccNUMA node or the new multi-core domain) via > cpu affinity, kernel fails to perform proper load balance at > these domains, due to several logic in find_busiest_group() miss > identified busiest sched group within a given domain. This leads to > inadequate load balance and causes 50% performance hit. > > To give you a concrete example, on a dual-core, 2 socket numa system, > there are 4 logical cpu, organized as: oops, this issue can easily happen when cores are not sharing caches. I think this is what happening on your setup, right? thanks, suresh