All of lore.kernel.org
 help / color / mirror / Atom feed
From: Georgi Djakov <djakov@kernel.org>
To: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>,
	Andy Gross <agross@kernel.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Rob Herring <robh+dt@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Subject: Re: [PATCH v4 4/4] arm64: dts: qcom: sdm845: Add CPU BWMON
Date: Mon, 6 Jun 2022 23:39:19 +0300	[thread overview]
Message-ID: <058de46e-24cf-e25b-121c-3ff080702776@kernel.org> (raw)
In-Reply-To: <20220601101140.170504-5-krzysztof.kozlowski@linaro.org>

On 1.06.22 13:11, Krzysztof Kozlowski wrote:
> Add device node for CPU-memory BWMON device (bandwidth monitoring) on
> SDM845 measuring bandwidth between CPU (gladiator_noc) and Last Level
> Cache (memnoc).  Usage of this BWMON allows to remove fixed bandwidth
> votes from cpufreq (CPU nodes) thus achieve high memory throughput even
> with lower CPU frequencies.
> 
> Performance impact (SDM845-MTP RB3 board, linux next-20220422):
> 1. No noticeable impact when running with schedutil or performance
>     governors.
> 
> 2. When comparing to customized kernel with synced interconnects and
>     without bandwidth votes from CPU freq, the sysbench memory tests
>     show significant improvement with bwmon for blocksizes past the L3
>     cache.  The results for such superficial comparison:
> 
> sysbench memory test, results in MB/s (higher is better)
>   bs kB |  type |    V  | V+no bw votes | bwmon | benefit %
>       1 | W/seq | 14795 |          4816 |  4985 |      3.5%
>      64 | W/seq | 41987 |         10334 | 10433 |      1.0%
>    4096 | W/seq | 29768 |          8728 | 32007 |    266.7%
>   65536 | W/seq | 17711 |          4846 | 18399 |    279.6%
> 262144 | W/seq | 16112 |          4538 | 17429 |    284.1%
>      64 | R/seq | 61202 |         67092 | 66804 |     -0.4%
>    4096 | R/seq | 23871 |          5458 | 24307 |    345.4%
>   65536 | R/seq | 18554 |          4240 | 18685 |    340.7%
> 262144 | R/seq | 17524 |          4207 | 17774 |    322.4%
>      64 | W/rnd |  2663 |          1098 |  1119 |      1.9%
>   65536 | W/rnd |   600 |           316 |   610 |     92.7%
>      64 | R/rnd |  4915 |          4784 |  4594 |     -4.0%
>   65536 | R/rnd |   664 |           281 |   678 |    140.7%
> 
> Legend:
> bs kB: block size in KB (small block size means only L1-3 caches are
>        used
> type: R - read, W - write, seq - sequential, rnd - random
> V: vanilla (next-20220422)
> V + no bw votes: vanilla without bandwidth votes from CPU freq
> bwmon: bwmon without bandwidth votes from CPU freq
> benefit %: difference between vanilla without bandwidth votes and bwmon
>             (higher is better)
> 

Ok, now i see! So bwmon shows similar performance compared with the current
cpufreq-based bandwidth scaling. And if you add bwmon on top of vanilla, are
the results close/same? Is the plan to remove the cpufreq based bandwidth
scaling and switch to bwmon? It might improve the power consumption in some
scenarios.

Thanks,
Georgi

> Co-developed-by: Thara Gopinath <thara.gopinath@linaro.org>
> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
>   arch/arm64/boot/dts/qcom/sdm845.dtsi | 54 ++++++++++++++++++++++++++++
>   1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 83e8b63f0910..adffb9c70566 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -2026,6 +2026,60 @@ llcc: system-cache-controller@1100000 {
>   			interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
>   		};
>   
> +		pmu@1436400 {
> +			compatible = "qcom,sdm845-cpu-bwmon";
> +			reg = <0 0x01436400 0 0x600>;
> +
> +			interrupts = <GIC_SPI 581 IRQ_TYPE_LEVEL_HIGH>;
> +
> +			interconnects = <&gladiator_noc MASTER_APPSS_PROC 3 &mem_noc SLAVE_EBI1 3>,
> +					<&osm_l3 MASTER_OSM_L3_APPS &osm_l3 SLAVE_OSM_L3>;
> +			interconnect-names = "ddr", "l3c";
> +
> +			operating-points-v2 = <&cpu_bwmon_opp_table>;
> +
> +			cpu_bwmon_opp_table: opp-table {
> +				compatible = "operating-points-v2";
> +
> +				/*
> +				 * The interconnect paths bandwidths taken from
> +				 * cpu4_opp_table bandwidth.
> +				 * They also match different tables from
> +				 * msm-4.9 downstream kernel:
> +				 *  - the gladiator_noc-mem_noc from bandwidth
> +				 *    table of qcom,llccbw (property qcom,bw-tbl);
> +				 *    bus width: 4 bytes;
> +				 *  - the OSM L3 from bandwidth table of
> +				 *    qcom,cpu4-l3lat-mon (qcom,core-dev-table);
> +				 *    bus width: 16 bytes;
> +				 */
> +				opp-0 {
> +					opp-peak-kBps = <800000 4800000>;
> +				};
> +				opp-1 {
> +					opp-peak-kBps = <1804000 9216000>;
> +				};
> +				opp-2 {
> +					opp-peak-kBps = <2188000 11980800>;
> +				};
> +				opp-3 {
> +					opp-peak-kBps = <3072000 15052800>;
> +				};
> +				opp-4 {
> +					opp-peak-kBps = <4068000 19353600>;
> +				};
> +				opp-5 {
> +					opp-peak-kBps = <5412000 20889600>;
> +				};
> +				opp-6 {
> +					opp-peak-kBps = <6220000 22425600>;
> +				};
> +				opp-7 {
> +					opp-peak-kBps = <7216000 25497600>;
> +				};
> +			};
> +		};
> +
>   		pcie0: pci@1c00000 {
>   			compatible = "qcom,pcie-sdm845";
>   			reg = <0 0x01c00000 0 0x2000>,


WARNING: multiple messages have this Message-ID (diff)
From: Georgi Djakov <djakov@kernel.org>
To: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>,
	Andy Gross <agross@kernel.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Rob Herring <robh+dt@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Subject: Re: [PATCH v4 4/4] arm64: dts: qcom: sdm845: Add CPU BWMON
Date: Mon, 6 Jun 2022 23:39:19 +0300	[thread overview]
Message-ID: <058de46e-24cf-e25b-121c-3ff080702776@kernel.org> (raw)
In-Reply-To: <20220601101140.170504-5-krzysztof.kozlowski@linaro.org>

On 1.06.22 13:11, Krzysztof Kozlowski wrote:
> Add device node for CPU-memory BWMON device (bandwidth monitoring) on
> SDM845 measuring bandwidth between CPU (gladiator_noc) and Last Level
> Cache (memnoc).  Usage of this BWMON allows to remove fixed bandwidth
> votes from cpufreq (CPU nodes) thus achieve high memory throughput even
> with lower CPU frequencies.
> 
> Performance impact (SDM845-MTP RB3 board, linux next-20220422):
> 1. No noticeable impact when running with schedutil or performance
>     governors.
> 
> 2. When comparing to customized kernel with synced interconnects and
>     without bandwidth votes from CPU freq, the sysbench memory tests
>     show significant improvement with bwmon for blocksizes past the L3
>     cache.  The results for such superficial comparison:
> 
> sysbench memory test, results in MB/s (higher is better)
>   bs kB |  type |    V  | V+no bw votes | bwmon | benefit %
>       1 | W/seq | 14795 |          4816 |  4985 |      3.5%
>      64 | W/seq | 41987 |         10334 | 10433 |      1.0%
>    4096 | W/seq | 29768 |          8728 | 32007 |    266.7%
>   65536 | W/seq | 17711 |          4846 | 18399 |    279.6%
> 262144 | W/seq | 16112 |          4538 | 17429 |    284.1%
>      64 | R/seq | 61202 |         67092 | 66804 |     -0.4%
>    4096 | R/seq | 23871 |          5458 | 24307 |    345.4%
>   65536 | R/seq | 18554 |          4240 | 18685 |    340.7%
> 262144 | R/seq | 17524 |          4207 | 17774 |    322.4%
>      64 | W/rnd |  2663 |          1098 |  1119 |      1.9%
>   65536 | W/rnd |   600 |           316 |   610 |     92.7%
>      64 | R/rnd |  4915 |          4784 |  4594 |     -4.0%
>   65536 | R/rnd |   664 |           281 |   678 |    140.7%
> 
> Legend:
> bs kB: block size in KB (small block size means only L1-3 caches are
>        used
> type: R - read, W - write, seq - sequential, rnd - random
> V: vanilla (next-20220422)
> V + no bw votes: vanilla without bandwidth votes from CPU freq
> bwmon: bwmon without bandwidth votes from CPU freq
> benefit %: difference between vanilla without bandwidth votes and bwmon
>             (higher is better)
> 

Ok, now i see! So bwmon shows similar performance compared with the current
cpufreq-based bandwidth scaling. And if you add bwmon on top of vanilla, are
the results close/same? Is the plan to remove the cpufreq based bandwidth
scaling and switch to bwmon? It might improve the power consumption in some
scenarios.

Thanks,
Georgi

> Co-developed-by: Thara Gopinath <thara.gopinath@linaro.org>
> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
>   arch/arm64/boot/dts/qcom/sdm845.dtsi | 54 ++++++++++++++++++++++++++++
>   1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 83e8b63f0910..adffb9c70566 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -2026,6 +2026,60 @@ llcc: system-cache-controller@1100000 {
>   			interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
>   		};
>   
> +		pmu@1436400 {
> +			compatible = "qcom,sdm845-cpu-bwmon";
> +			reg = <0 0x01436400 0 0x600>;
> +
> +			interrupts = <GIC_SPI 581 IRQ_TYPE_LEVEL_HIGH>;
> +
> +			interconnects = <&gladiator_noc MASTER_APPSS_PROC 3 &mem_noc SLAVE_EBI1 3>,
> +					<&osm_l3 MASTER_OSM_L3_APPS &osm_l3 SLAVE_OSM_L3>;
> +			interconnect-names = "ddr", "l3c";
> +
> +			operating-points-v2 = <&cpu_bwmon_opp_table>;
> +
> +			cpu_bwmon_opp_table: opp-table {
> +				compatible = "operating-points-v2";
> +
> +				/*
> +				 * The interconnect paths bandwidths taken from
> +				 * cpu4_opp_table bandwidth.
> +				 * They also match different tables from
> +				 * msm-4.9 downstream kernel:
> +				 *  - the gladiator_noc-mem_noc from bandwidth
> +				 *    table of qcom,llccbw (property qcom,bw-tbl);
> +				 *    bus width: 4 bytes;
> +				 *  - the OSM L3 from bandwidth table of
> +				 *    qcom,cpu4-l3lat-mon (qcom,core-dev-table);
> +				 *    bus width: 16 bytes;
> +				 */
> +				opp-0 {
> +					opp-peak-kBps = <800000 4800000>;
> +				};
> +				opp-1 {
> +					opp-peak-kBps = <1804000 9216000>;
> +				};
> +				opp-2 {
> +					opp-peak-kBps = <2188000 11980800>;
> +				};
> +				opp-3 {
> +					opp-peak-kBps = <3072000 15052800>;
> +				};
> +				opp-4 {
> +					opp-peak-kBps = <4068000 19353600>;
> +				};
> +				opp-5 {
> +					opp-peak-kBps = <5412000 20889600>;
> +				};
> +				opp-6 {
> +					opp-peak-kBps = <6220000 22425600>;
> +				};
> +				opp-7 {
> +					opp-peak-kBps = <7216000 25497600>;
> +				};
> +			};
> +		};
> +
>   		pcie0: pci@1c00000 {
>   			compatible = "qcom,pcie-sdm845";
>   			reg = <0 0x01c00000 0 0x2000>,


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-06-06 20:46 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01 10:11 [PATCH v4 0/4] soc/arm64: qcom: Add initial version of bwmon Krzysztof Kozlowski
2022-06-01 10:11 ` Krzysztof Kozlowski
2022-06-01 10:11 ` [PATCH v4 1/4] dt-bindings: interconnect: qcom,sdm845-cpu-bwmon: add BWMON device Krzysztof Kozlowski
2022-06-01 10:11   ` Krzysztof Kozlowski
2022-06-06 21:11   ` Bjorn Andersson
2022-06-06 21:11     ` Bjorn Andersson
2022-06-07  6:50     ` Krzysztof Kozlowski
2022-06-07  6:50       ` Krzysztof Kozlowski
2022-06-22 11:58       ` Rajendra Nayak
2022-06-22 11:58         ` Rajendra Nayak
2022-06-22 12:20         ` Krzysztof Kozlowski
2022-06-22 12:20           ` Krzysztof Kozlowski
2022-06-26  3:19         ` Bjorn Andersson
2022-06-26  3:19           ` Bjorn Andersson
2022-06-28 10:43           ` Rajendra Nayak
2022-06-28 10:43             ` Rajendra Nayak
2022-06-01 10:11 ` [PATCH v4 2/4] soc: qcom: icc-bwmon: Add bandwidth monitoring driver Krzysztof Kozlowski
2022-06-01 10:11   ` Krzysztof Kozlowski
2022-06-06 16:35   ` Georgi Djakov
2022-06-06 16:35     ` Georgi Djakov
2022-06-01 10:11 ` [PATCH v4 3/4] arm64: defconfig: enable Qualcomm Bandwidth Monitor Krzysztof Kozlowski
2022-06-01 10:11   ` Krzysztof Kozlowski
2022-06-01 10:11 ` [PATCH v4 4/4] arm64: dts: qcom: sdm845: Add CPU BWMON Krzysztof Kozlowski
2022-06-01 10:11   ` Krzysztof Kozlowski
2022-06-06 20:39   ` Georgi Djakov [this message]
2022-06-06 20:39     ` Georgi Djakov
2022-06-07  6:48     ` Krzysztof Kozlowski
2022-06-07  6:48       ` Krzysztof Kozlowski
2022-06-22 11:46   ` Rajendra Nayak
2022-06-22 11:46     ` Rajendra Nayak
2022-06-22 13:52     ` Krzysztof Kozlowski
2022-06-22 13:52       ` Krzysztof Kozlowski
2022-06-23  6:48       ` Rajendra Nayak
2022-06-23  6:48         ` Rajendra Nayak
2022-06-23 12:58         ` Krzysztof Kozlowski
2022-06-23 12:58           ` Krzysztof Kozlowski
2022-06-26  3:28           ` Bjorn Andersson
2022-06-26  3:28             ` Bjorn Andersson
2022-06-27 12:39             ` Krzysztof Kozlowski
2022-06-27 12:39               ` Krzysztof Kozlowski
2022-06-28 10:36               ` Rajendra Nayak
2022-06-28 10:36                 ` Rajendra Nayak
2022-06-28 10:50                 ` Krzysztof Kozlowski
2022-06-28 10:50                   ` Krzysztof Kozlowski
2022-06-28 13:15                   ` Rajendra Nayak
2022-06-28 13:15                     ` Rajendra Nayak
2022-06-28 14:02                     ` Krzysztof Kozlowski
2022-06-28 14:02                       ` Krzysztof Kozlowski
2022-06-28 15:20                       ` Rajendra Nayak
2022-06-28 15:20                         ` Rajendra Nayak
2022-06-28 15:23                         ` Krzysztof Kozlowski
2022-06-28 15:23                           ` Krzysztof Kozlowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=058de46e-24cf-e25b-121c-3ff080702776@kernel.org \
    --to=djakov@kernel.org \
    --cc=agross@kernel.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=devicetree@vger.kernel.org \
    --cc=krzysztof.kozlowski@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=thara.gopinath@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.