From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S933482AbdKGIaf (ORCPT <rfc822;w@1wt.eu>);
        Tue, 7 Nov 2017 03:30:35 -0500
Received: from bombadil.infradead.org ([65.50.211.133]:56389 "EHLO
        bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S933443AbdKGIae (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 7 Nov 2017 03:30:34 -0500
Date: Tue, 7 Nov 2017 09:30:19 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, tony.luck@intel.com,
        tim.c.chen@linux.intel.com, hpa@linux.intel.com, bp@alien8.de,
        rientjes@google.com, imammedo@redhat.com, prarit@redhat.com,
        toshi.kani@hp.com, brice.goglin@gmail.com, mingo@kernel.org
Subject: Re: [RFC][PATCH] x86, sched: allow topolgies where NUMA nodes share
 an LLC
Message-ID: <20171107083019.GG3326@worktop>
References: <20171106221500.310295D7@viggo.jf.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20171106221500.310295D7@viggo.jf.intel.com>
User-Agent: Mutt/1.5.22.1 (2013-10-16)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Nov 06, 2017 at 02:15:00PM -0800, Dave Hansen wrote:

> But, the CPUID for the SNC configuration discussed above enumerates
> the LLC as being shared by the entire package.  This is not 100%
> precise because the entire cache is not usable by all accesses.  But,
> it *is* the way the hardware enumerates itself, and this is not likely
> to change.

So CPUID and SRAT will remain inconsistent; even in future products?
That would absolutely blow chunks.

If that is the case, we'd best use a fake feature like
X86_BUG_TOPOLOGY_BROKEN and use that instead of an ever growing list of
models in this code.

> +/*
> + * Set if a package/die has multiple NUMA nodes inside.
> + * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
> + * Sub-NUMA Clustering have this.
> + */
> +static bool x86_has_numa_in_package;
> +
>  static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
>  	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>  
> +	/* Do not match if we do not have a valid APICID for cpu: */
> +	if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
> +		return false;
> +
> +	/* Do not match if LLC id does not match: */
> +	if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2))
> +		return false;
>  
> +	/*
> +	 * Some Intel CPUs enumerate an LLC that is shared by
> +	 * multiple NUMA nodes.  The LLC on these systems is
> +	 * shared for off-package data acccess but private to the
> +	 * NUMA node (half of the package) for on-package access.
> +	 *
> +	 * CPUID can only enumerate the cache as being shared *or*
> +	 * unshared, but not this particular configuration.  The
> +	 * CPU in this case enumerates the cache to be shared
> +	 * across the entire package (spanning both NUMA nodes).
> +	 */
> +	if (!topology_same_node(c, o) &&
> +	    (c->x86_model == INTEL_FAM6_SKYLAKE_X)) {

This needs a c->x86_vendor test; imagine the fun when AMD releases a
part with model == SKX ...

> +		/* Use NUMA instead of coregroups for scheduling: */
> +		x86_has_numa_in_package = true;
> +
> +		/*
> +		 * Now, tell the truth, that the LLC matches. But,
> +		 * note that throwing away coregroups for
> +		 * scheduling means this will have no actual effect.
> +		 */
> +		return true;

What are the ramifications here? Is anybody else using that cpumask
outside of the scheduler topology setup?

> +	}
> +
> +	return topology_sane(c, o, "llc");
>  }