From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A3EEC43334 for ; Wed, 22 Jun 2022 13:54:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=rrTkmeDW6Bo9yKB4HxERs8uk9ugLqcm+kLjdyy94SUI=; b=axBFc4roQMhKHl I5Uo3pNLa4hRNOn5D7LttYNjALEEd0abxXqNU00uoDANSA6wQCyAxnjYQvv70shUwzkOilUU3fn2Q iB0vLuO+Ak0KGrTs+0lO8kKrt9f4xfjNagp1LyQXIDppIrQuTNbmFxE0oO6VX9K3pagJHg94t4aQq COZp8UpG6YmxBfth7ehfqPW9Ze9BGtCyQRPWU2J9G3G1ukgu4DNZ9gGiNBJiJ260ylWWthIxgP01z akwDRkCFhaMlDUOik389y/Ouqz10T0aN1hGAZNVHooPapS9nVJO85Xeg2FNe7i3DDgAsQVo98l2Pc /gYQe1ZUggqC3IP7K9Zw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1o40n3-00Aj0D-1J; Wed, 22 Jun 2022 13:53:05 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1o40mt-00AiwY-03 for linux-arm-kernel@lists.infradead.org; Wed, 22 Jun 2022 13:52:56 +0000 Received: by mail-ej1-x62d.google.com with SMTP id h23so34372268ejj.12 for ; Wed, 22 Jun 2022 06:52:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=i58AvAf3xFLhQ/L4OE2PIGkyGZEELgf2pk8ePev6kB4=; b=ZkBEZC+r/wMombXRguzU+DolPji+n50SYFb6A0PR3PacPtUpGCNYgvsZVodwBvhB+Y HrormOXJ+myRBnLX1j5uZOwGsLSXr9ktsBD6s6AwWO/tz+1mPzzjLyQgYzQID2Rj43GW OPbZM2TqykrzmCbZcIYyQpswfX0SbmVk21uxtoCDAriIIcSFrEnyXqlH7+vQ3o4S/FMW iiHreXsvl9piaIunAUaUmlNdw39EhsXKLbw+UBzxjHCUtuvb6c/sxy9Vd0L9F+jg2O1r /CroBhvjd8bv0uW0EjJE5ocqOhWF9BeMROOIPhEnMlYVCH0HypyoKfUAdyiBvH6ynf/H w7hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=i58AvAf3xFLhQ/L4OE2PIGkyGZEELgf2pk8ePev6kB4=; b=JK1UM3HQijNOiAN2g1BAO6QrgVgp0IEz0KRJsVuGoEza4g+lWKsdNBvgsuQ054Gvxq vW0E6njAvRXNpksEDjz7asEwRpG9da008EDAV29e7Ey3EocDo8QtIc4/Lp3f3BdUBnC3 BBT5M7TZU+jzaMhFWs3wMzqrmLuHbUTqvF/KWMI488UinpLOoaAmVskWgfduhiAoC/GH Vry5IPHJa8wI1A6D3lMml83NbiEGxzdZrnfRSAW+KxLY1nFGQjFrKuVynGOYFCRq/IEY 2kJiR13YTe8Pcp1dJA7jyK7pNxO69lScWKinzdpjYEUPv5+jO6jdFZMGaS4+4M3SBtTs 94JQ== X-Gm-Message-State: AJIora8nbfttjj6GIJzUVEuA1LoEYwpU9v4VVg45IWMUPeXnJOJADvp8 6eRse3qkRl/oabbROMydt7N94Q== X-Google-Smtp-Source: AGRyM1vah6C9m/j0HXJ9FZUYU4HSpCWgqDbDo7mMu1bIGnoxi3GAUHrMtda/2yQgJJS6TrXv5OUtPA== X-Received: by 2002:a17:906:33db:b0:70f:12c0:4ade with SMTP id w27-20020a17090633db00b0070f12c04ademr3286585eja.320.1655905971223; Wed, 22 Jun 2022 06:52:51 -0700 (PDT) Received: from [192.168.0.224] (xdsl-188-155-176-92.adslplus.ch. [188.155.176.92]) by smtp.gmail.com with ESMTPSA id kx10-20020a170907774a00b00722df6db8f3sm3296099ejc.115.2022.06.22.06.52.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jun 2022 06:52:50 -0700 (PDT) Message-ID: <64eb52ee-b3ac-3d94-cfce-ceb1c88dddb6@linaro.org> Date: Wed, 22 Jun 2022 15:52:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH v4 4/4] arm64: dts: qcom: sdm845: Add CPU BWMON Content-Language: en-US To: Rajendra Nayak , Andy Gross , Bjorn Andersson , Georgi Djakov , Rob Herring , Catalin Marinas , Will Deacon , linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Cc: Thara Gopinath References: <20220601101140.170504-1-krzysztof.kozlowski@linaro.org> <20220601101140.170504-5-krzysztof.kozlowski@linaro.org> From: Krzysztof Kozlowski In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220622_065255_107059_25A64512 X-CRM114-Status: GOOD ( 20.90 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 22/06/2022 13:46, Rajendra Nayak wrote: > > On 6/1/2022 3:41 PM, Krzysztof Kozlowski wrote: >> Add device node for CPU-memory BWMON device (bandwidth monitoring) on >> SDM845 measuring bandwidth between CPU (gladiator_noc) and Last Level >> Cache (memnoc). Usage of this BWMON allows to remove fixed bandwidth >> votes from cpufreq (CPU nodes) thus achieve high memory throughput even >> with lower CPU frequencies. >> >> Performance impact (SDM845-MTP RB3 board, linux next-20220422): >> 1. No noticeable impact when running with schedutil or performance >> governors. >> >> 2. When comparing to customized kernel with synced interconnects and >> without bandwidth votes from CPU freq, the sysbench memory tests >> show significant improvement with bwmon for blocksizes past the L3 >> cache. The results for such superficial comparison: >> >> sysbench memory test, results in MB/s (higher is better) >> bs kB | type | V | V+no bw votes | bwmon | benefit % >> 1 | W/seq | 14795 | 4816 | 4985 | 3.5% >> 64 | W/seq | 41987 | 10334 | 10433 | 1.0% >> 4096 | W/seq | 29768 | 8728 | 32007 | 266.7% >> 65536 | W/seq | 17711 | 4846 | 18399 | 279.6% >> 262144 | W/seq | 16112 | 4538 | 17429 | 284.1% >> 64 | R/seq | 61202 | 67092 | 66804 | -0.4% >> 4096 | R/seq | 23871 | 5458 | 24307 | 345.4% >> 65536 | R/seq | 18554 | 4240 | 18685 | 340.7% >> 262144 | R/seq | 17524 | 4207 | 17774 | 322.4% >> 64 | W/rnd | 2663 | 1098 | 1119 | 1.9% >> 65536 | W/rnd | 600 | 316 | 610 | 92.7% >> 64 | R/rnd | 4915 | 4784 | 4594 | -4.0% >> 65536 | R/rnd | 664 | 281 | 678 | 140.7% >> >> Legend: >> bs kB: block size in KB (small block size means only L1-3 caches are >> used >> type: R - read, W - write, seq - sequential, rnd - random >> V: vanilla (next-20220422) >> V + no bw votes: vanilla without bandwidth votes from CPU freq >> bwmon: bwmon without bandwidth votes from CPU freq >> benefit %: difference between vanilla without bandwidth votes and bwmon >> (higher is better) >> >> Co-developed-by: Thara Gopinath >> Signed-off-by: Thara Gopinath >> Signed-off-by: Krzysztof Kozlowski >> --- >> arch/arm64/boot/dts/qcom/sdm845.dtsi | 54 ++++++++++++++++++++++++++++ >> 1 file changed, 54 insertions(+) >> >> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi >> index 83e8b63f0910..adffb9c70566 100644 >> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi >> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi >> @@ -2026,6 +2026,60 @@ llcc: system-cache-controller@1100000 { >> interrupts = ; >> }; >> >> + pmu@1436400 { >> + compatible = "qcom,sdm845-cpu-bwmon"; >> + reg = <0 0x01436400 0 0x600>; >> + >> + interrupts = ; >> + >> + interconnects = <&gladiator_noc MASTER_APPSS_PROC 3 &mem_noc SLAVE_EBI1 3>, >> + <&osm_l3 MASTER_OSM_L3_APPS &osm_l3 SLAVE_OSM_L3>; >> + interconnect-names = "ddr", "l3c"; > > Is this the pmu/bwmon instance between the cpu and caches or the one between the caches and DDR? To my understanding this is the one between CPU and caches. > Depending on which one it is, shouldn;t we just be scaling either one and not both the interconnect paths? The interconnects are the same as ones used for CPU nodes, therefore if we want to scale both when scaling CPU, then we also want to scale both when seeing traffic between CPU and cache. Maybe the assumption here is not correct, so basically the two interconnects in CPU nodes are also not proper? Best regards, Krzysztof _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel