From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758039AbdKGQWW (ORCPT ); Tue, 7 Nov 2017 11:22:22 -0500 Received: from mga09.intel.com ([134.134.136.24]:4899 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756827AbdKGQWV (ORCPT ); Tue, 7 Nov 2017 11:22:21 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,359,1505804400"; d="scan'208";a="4555172" Subject: Re: [RFC][PATCH] x86, sched: allow topolgies where NUMA nodes share an LLC To: Peter Zijlstra References: <20171106221500.310295D7@viggo.jf.intel.com> <20171107083019.GG3326@worktop> Cc: linux-kernel@vger.kernel.org, tony.luck@intel.com, tim.c.chen@linux.intel.com, hpa@linux.intel.com, bp@alien8.de, rientjes@google.com, imammedo@redhat.com, prarit@redhat.com, toshi.kani@hp.com, brice.goglin@gmail.com, mingo@kernel.org From: Dave Hansen Message-ID: Date: Tue, 7 Nov 2017 08:22:19 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171107083019.GG3326@worktop> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/07/2017 12:30 AM, Peter Zijlstra wrote: > On Mon, Nov 06, 2017 at 02:15:00PM -0800, Dave Hansen wrote: > >> But, the CPUID for the SNC configuration discussed above enumerates >> the LLC as being shared by the entire package. This is not 100% >> precise because the entire cache is not usable by all accesses. But, >> it *is* the way the hardware enumerates itself, and this is not likely >> to change. > > So CPUID and SRAT will remain inconsistent; even in future products? > That would absolutely blow chunks. It certainly isn't ideal as it stands. If it was changed, what would it be changed to? You can not even represent the current L3 topology in CPUID, at least not precisely. I've been arguing we should optimize the CPUID information for performance. Right now, it's suboptimal for folks doing NUMA-local allocations, and I think that's precisely the group of folks that needs precise information. I'm trying to get it changed going forward. > If that is the case, we'd best use a fake feature like > X86_BUG_TOPOLOGY_BROKEN and use that instead of an ever growing list of > models in this code. FWIW, I don't consider the current situation broken. Nobody ever promised the kernel that a NUMA node would never happen inside a socket, or inside a cache boundary enumerated in CPUID. The assumptions the kernel made were sane, but the CPU's description of itself, *and* the BIOS-provided information are also sane. But, the world changed, some of those assumptions turned out to be wrong, and somebody needs to adjust. ... >> + if (!topology_same_node(c, o) && >> + (c->x86_model == INTEL_FAM6_SKYLAKE_X)) { > > This needs a c->x86_vendor test; imagine the fun when AMD releases a > part with model == SKX ... Yup, will do. >> + /* Use NUMA instead of coregroups for scheduling: */ >> + x86_has_numa_in_package = true; >> + >> + /* >> + * Now, tell the truth, that the LLC matches. But, >> + * note that throwing away coregroups for >> + * scheduling means this will have no actual effect. >> + */ >> + return true; > > What are the ramifications here? Is anybody else using that cpumask > outside of the scheduler topology setup? I looked for it and didn't see anything else. I'll double check that nothing has popped up since I hacked this together.