From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFAE4C433E7 for ; Mon, 19 Oct 2020 12:42:11 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4FEED21D81 for ; Mon, 19 Oct 2020 12:42:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="2RiNy7/N" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4FEED21D81 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=Huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=4Ynd+2Z1tCUVQKYtOu7F3cz+u6StEVR9/2PEDxuXnT8=; b=2RiNy7/NKLY5jqNyFpj5DUWOL hgZzjkG6YOROByxUJjCzQm5ksRQkzugSwgr0Xm0gKel0S7QwwOsq3Ks90BX6MRrFh6Ot15iXsi4gG DtKNOMlVpdO1UShYBK1vrqKhnoNUVqwvJiCDvY6K0k8PLDU7C3cMI4XteGtnWsZjNz8SeHrypSB2R o3AYKvzBYSlReAar8StP7KtRpXQnnLk9rbF5efkDj35nJMQAgkws7/QLN5agDqdu1TPHiSaiAKN7H lmsYTwSe95BKHuJBBHYuaF9BT8+4PKFL1mRECpgxvy8iPGmJNLVfHylfhTW/fDP+NYbIX8he0rg4X SVLEY89vg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kUUSq-0003aJ-Lw; Mon, 19 Oct 2020 12:40:36 +0000 Received: from lhrrgout.huawei.com ([185.176.76.210] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kUUSn-0003ZW-FU for linux-arm-kernel@lists.infradead.org; Mon, 19 Oct 2020 12:40:34 +0000 Received: from lhreml710-chm.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 43F26A2699A4B1467DC1; Mon, 19 Oct 2020 13:40:32 +0100 (IST) Received: from localhost (10.52.126.130) by lhreml710-chm.china.huawei.com (10.201.108.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Mon, 19 Oct 2020 13:40:31 +0100 Date: Mon, 19 Oct 2020 13:38:36 +0100 From: Jonathan Cameron To: Brice Goglin Subject: Re: [RFC PATCH] topology: Represent clusters of CPUs within a die. Message-ID: <20201019123836.00004877@Huawei.com> In-Reply-To: <942b4d68-8d19-66d8-c84b-d17eba837e9a@inria.fr> References: <20201016152702.1513592-1-Jonathan.Cameron@huawei.com> <942b4d68-8d19-66d8-c84b-d17eba837e9a@inria.fr> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 X-Originating-IP: [10.52.126.130] X-ClientProxiedBy: lhreml711-chm.china.huawei.com (10.201.108.62) To lhreml710-chm.china.huawei.com (10.201.108.61) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201019_084033_619467_3F78D630 X-CRM114-Status: GOOD ( 28.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Len Brown , Greg Kroah-Hartman , x86@kernel.org, guohanjun@huawei.com, linux-kernel@vger.kernel.org, linuxarm@huawei.com, linux-acpi@vger.kernel.org, Sudeep Holla , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, 19 Oct 2020 12:00:15 +0200 Brice Goglin wrote: > Le 16/10/2020 =E0 17:27, Jonathan Cameron a =E9crit=A0: > > Both ACPI and DT provide the ability to describe additional layers of > > topology between that of individual cores and higher level constructs > > such as the level at which the last level cache is shared. > > In ACPI this can be represented in PPTT as a Processor Hierarchy > > Node Structure [1] that is the parent of the CPU cores and in turn > > has a parent Processor Hierarchy Nodes Structure representing > > a higher level of topology. > > > > For example Kunpeng 920 has clusters of 4 CPUs. These do not share > > any cache resources, but the interconnect topology is such that > > the cost to transfer ownership of a cacheline between CPUs within > > a cluster is lower than between CPUs in different clusters on the same > > die. Hence, it can make sense to deliberately schedule threads > > sharing data to a single cluster. > > > > This patch simply exposes this information to userspace libraries > > like hwloc by providing cluster_cpus and related sysfs attributes. > > PoC of HWLOC support at [2]. > > > > Note this patch only handle the ACPI case. > > > > Special consideration is needed for SMT processors, where it is > > necessary to move 2 levels up the hierarchy from the leaf nodes > > (thus skipping the processor core level). > > > > Currently the ID provided is the offset of the Processor > > Hierarchy Nodes Structure within PPTT. Whilst this is unique > > it is not terribly elegant so alternative suggestions welcome. > > > > Note that arm64 / ACPI does not provide any means of identifying > > a die level in the topology but that may be unrelate to the cluster > > level. > > > > RFC questions: > > 1) Naming > > 2) Related to naming, do we want to represent all potential levels, > > or this enough? On Kunpeng920, the next level up from cluster happe= ns > > to be covered by llc cache sharing, but in theory more than one > > level of cluster description might be needed by some future system. > > 3) Do we need DT code in place? I'm not sure any DT based ARM64 > > systems would have enough complexity for this to be useful. > > 4) Other architectures? Is this useful on x86 for example? = > = > = > Hello Jonathan Hi Brice, > = > Intel has CPUID registers to describe "tiles" and "modules" too (not > used yet as far as I know). The list of levels could become quite long > if any processor ever exposes those. If having multiple cluster levels > is possible, maybe it's time to think about introducing some sort of > generic levels: I've been wondering what tiles and modules are... Looking back and naming over time, I'm going to guess tiles are the closest to the particular case I was focusing on here. > = > cluster0_id =3D your cluster_id > cluster0_cpus/cpulist =3D your cluster_cpus/cpulis > cluster0_type =3D would optionally contain hardware-specific info such as > "module" or "tile" on x86 > cluster_levels =3D 1 I wondered exactly the same question. At this point, perhaps we just statically introduce an 0 index, but with the assumption we would extend that as / when necessary in future. > = > hwloc already does something like this for some "rare" levels such as > s390 book/drawers (by the way, thanks a lot for the hwloc PoC, very good > job), we call them "Groups" instead of "cluster" above. Given we definitely have a 'naming' issue here, perhaps group0 etc is a good generic choice? = > = > However I don't know if the Linux scheduler would like that. Is it > better to have 10+ levels with static names, or a dynamic number of level= s? So far our 'general' experiments with adding clusters into the kernel scheduler have been a typical mixed bunch. Hence the proposal to just expose the structure to userspace where we should at least know what the workload is. Hopefully we'll gain more experience with using it and use that to drive possible kernel scheduler changes. > = > Brice Thanks, Jonathan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel