Linux ARM-MSM sub-architecture
 help / color / mirror / Atom feed
* [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU
@ 2024-11-28 10:25 Neil Armstrong
  2024-11-28 10:25 ` [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes Neil Armstrong
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

The Adreno GPU Management Unit (GMU) can also vote for DDR Bandwidth
along the Frequency and Power Domain level, but by default we leave the
OPP core scale the interconnect ddr path.

While scaling the interconnect path was sufficient, newer GPUs
like the A750 requires specific vote parameters and bandwidth to
achieve full functionnality.

In order to get the vote values to be used by the GPU Management
Unit (GMU), we need to parse all the possible OPP Bandwidths and
create a vote value to be send to the appropriate Bus Control
Modules (BCMs) declared in the GPU info struct.
The added dev_pm_opp_get_bw() is used in this case.

The vote array will then be used to dynamically generate the GMU
bw_table sent during the GMU power-up.

Those entries will then be used by passing the appropriate
bandwidth level when voting for a GPU frequency.

This will make sure all resources are equally voted for a
same OPP, whatever decision is done by the GMU, it will
ensure all resources votes are synchronized.

Depends on [1] to avoid crashing when getting OPP bandwidths.

[1] https://lore.kernel.org/all/20241128-topic-opp-fix-assert-index-check-v1-0-cb8bd4c0370e@linaro.org/

Ran full vulkan-cts-1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 with mesa 25.0.0+git3ecf2a0518 on:
- QRD8550
- QRD8650
- HDK8650

Patchset is based on current msm-next including preemption support.

Any feedback is welcome.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
Changes in v3:
- I didn't take Dmitry's review tags since I significantly changed the patches
- Dropped applied OPP change
- Dropped QUIRK/FEATURE addition/rename in favor of checking the a6xx_info->bcms pointer
- Switch a6xx_info->bcms to a pointer, so it can be easy to share the table
- Generate AB votes in advance, the voting was wrong in v2 we need to quantitiwe each bandwidth value
- Do not vote via GMU is there's only the OFF vote because DT doesn't have the right properties
- Added defines for the a6xx_gmu freqs tables to not have magic 16 and 4 values
- Renamed gpu_bw_votes to gpu_ib_votes to match the downstream naming
- Changed the parameters of a6xx_hfi_set_freq() to u32 to match the data type we pass
- Drop "request for maximum bus bandwidth usage" and merge it in previous changes
- Link to v2: https://lore.kernel.org/r/20241119-topic-sm8x50-gpu-bw-vote-v2-0-4deb87be2498@linaro.org

Changes in v2:
- opp: rename to dev_pm_opp_get_bw, fix commit message and kerneldoc
- remove quirks that are features and move them to a dedicated .features bitfield
- get icc bcm kerneldoc, and simplify/cleanup a6xx_gmu_rpmh_bw_votes_init()
  - no more copies of data
  - take calculations from icc-rpmh/bcm-voter
  - move into a single cleaner function
- fix a6xx_gmu_set_freq() but not calling dev_pm_opp_set_opp() if !bw_index
- also vote for maximum bus bandwidth usage (AB)
- overall fix typos in commit messages
- Link to v1: https://lore.kernel.org/r/20241113-topic-sm8x50-gpu-bw-vote-v1-0-3b8d39737a9b@linaro.org

---
Neil Armstrong (7):
      drm/msm: adreno: add defines for gpu & gmu frequency table sizes
      drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
      drm/msm: adreno: dynamically generate GMU bw table
      drm/msm: adreno: find bandwidth index of OPP and set it along freq index
      drm/msm: adreno: enable GMU bandwidth for A740 and A750
      arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU
      arm64: qcom: dts: sm8650: add interconnect and opp-peak-kBps for GPU

 arch/arm64/boot/dts/qcom/sm8550.dtsi      |  11 ++
 arch/arm64/boot/dts/qcom/sm8650.dtsi      |  14 +++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c |  22 ++++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c     | 197 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h     |  27 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |   1 +
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c     |  45 ++++++-
 drivers/gpu/drm/msm/adreno/a6xx_hfi.h     |   5 +
 8 files changed, 309 insertions(+), 13 deletions(-)
---
base-commit: 18ac96e1bd761af2b7c2fc99901e9a813a6f3bb3
change-id: 20241113-topic-sm8x50-gpu-bw-vote-f5e022fe7a47

Best regards,
-- 
Neil Armstrong <neil.armstrong@linaro.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-28 13:24   ` Dmitry Baryshkov
  2024-11-30 20:39   ` Akhil P Oommen
  2024-11-28 10:25 ` [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU Neil Armstrong
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

Even if the code uses ARRAY_SIZE() to fill those tables,
it's still a best practice to not use magic values for
tables in structs.

Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index b4a79f88ccf45cfe651c86d2a9da39541c5772b3..88f18ea6a38a08b5b171709e5020010947a5d347 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -19,6 +19,9 @@ struct a6xx_gmu_bo {
 	u64 iova;
 };
 
+#define GMU_MAX_GX_FREQS	16
+#define GMU_MAX_CX_FREQS	4
+
 /*
  * These define the different GMU wake up options - these define how both the
  * CPU and the GMU bring up the hardware
@@ -79,12 +82,12 @@ struct a6xx_gmu {
 	int current_perf_index;
 
 	int nr_gpu_freqs;
-	unsigned long gpu_freqs[16];
-	u32 gx_arc_votes[16];
+	unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
+	u32 gx_arc_votes[GMU_MAX_GX_FREQS];
 
 	int nr_gmu_freqs;
-	unsigned long gmu_freqs[4];
-	u32 cx_arc_votes[4];
+	unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
+	u32 cx_arc_votes[GMU_MAX_CX_FREQS];
 
 	unsigned long freq;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
  2024-11-28 10:25 ` [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-29 15:21   ` Konrad Dybcio
  2024-11-30 21:49   ` Akhil P Oommen
  2024-11-28 10:25 ` [PATCH v3 3/7] drm/msm: adreno: dynamically generate GMU bw table Neil Armstrong
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
the Frequency and Power Domain level, but by default we leave the
OPP core scale the interconnect ddr path.

While scaling via the interconnect path was sufficient, newer GPUs
like the A750 requires specific vote paremeters and bandwidth to
achieve full functionality.

In order to calculate vote values used by the GPU Management
Unit (GMU), we need to parse all the possible OPP Bandwidths and
create a vote value to be sent to the appropriate Bus Control
Modules (BCMs) declared in the GPU info struct.

This vote value is called IB, while on the othe side the GMU also
takes another vote called AB which is a 16bit quantized value
of the bandwidth against the maximum supported bandwidth.

The vote array will then be used to dynamically generate the GMU
bw_table sent during the GMU power-up.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
 drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
 4 files changed, 194 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -9,6 +9,7 @@
 #include <linux/pm_domain.h>
 #include <linux/pm_opp.h>
 #include <soc/qcom/cmd-db.h>
+#include <soc/qcom/tcs.h>
 #include <drm/drm_gem.h>
 
 #include "a6xx_gpu.h"
@@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
 	return 0;
 }
 
+/**
+ * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
+ * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
+ * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
+ * @vcd: virtual clock domain that this bcm belongs to
+ * @reserved: reserved field
+ */
+struct bcm_db {
+	__le32 unit;
+	__le16 width;
+	u8 vcd;
+	u8 reserved;
+};
+
+static u64 bcm_div(u64 num, u32 base)
+{
+	/* Ensure that small votes aren't lost. */
+	if (num && num < base)
+		return 1;
+
+	do_div(num, base);
+
+	return num;
+}
+
+static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
+				       struct a6xx_gmu *gmu)
+{
+	const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
+	unsigned int bcm_index, bw_index, bcm_count = 0;
+
+	if (!info->bcms)
+		return 0;
+
+	/* Retrieve BCM data from cmd-db */
+	for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
+		size_t count;
+
+		/* Stop at first unconfigured bcm */
+		if (!info->bcms[bcm_index].name)
+			break;
+
+		bcm_data[bcm_index] = cmd_db_read_aux_data(
+						info->bcms[bcm_index].name,
+						&count);
+		if (IS_ERR(bcm_data[bcm_index]))
+			return PTR_ERR(bcm_data[bcm_index]);
+
+		if (!count)
+			return -EINVAL;
+
+		++bcm_count;
+	}
+
+	/* Generate BCM votes values for each bandwidth & BCM */
+	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
+		u32 *data = gmu->gpu_ib_votes[bw_index];
+		u32 bw = gmu->gpu_bw_table[bw_index];
+
+		/* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */
+		for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
+			bool commit = false;
+			u64 peak, vote;
+			u16 width;
+			u32 unit;
+
+			/* Skip unconfigured BCM */
+			if (!bcm_data[bcm_index])
+				continue;
+
+			if (bcm_index == bcm_count - 1 ||
+			    (bcm_data[bcm_index + 1] &&
+			     bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd))
+				commit = true;
+
+			if (!bw) {
+				data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
+				continue;
+			}
+
+			if (info->bcms[bcm_index].fixed) {
+				u32 perfmode = 0;
+
+				if (bw >= info->bcms[bcm_index].perfmode_bw)
+					perfmode = info->bcms[bcm_index].perfmode;
+
+				data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode);
+				continue;
+			}
+
+			/* Multiply the bandwidth by the width of the connection */
+			width = le16_to_cpu(bcm_data[bcm_index]->width);
+			peak = bcm_div((u64)bw * width, info->bcms[bcm_index].buswidth);
+
+			/* Input bandwidth value is in KBps, scale the value to BCM unit */
+			unit = le32_to_cpu(bcm_data[bcm_index]->unit);
+			vote = bcm_div(peak * 1000ULL, unit);
+
+			if (vote > BCM_TCS_CMD_VOTE_MASK)
+				vote = BCM_TCS_CMD_VOTE_MASK;
+
+			data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
+		}
+	}
+
+	/* Generate AB votes which are a quantitized bandwidth value */
+	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
+		u64 tmp;
+
+		/*
+		 * The AB vote consists of a 16 bit wide quantized level
+		 * against the maximum supported bandwidth.
+		 * Quantization can be calculated as below:
+		 * vote = (bandwidth * 2^16) / max bandwidth
+		 */
+		tmp = gmu->gpu_bw_table[bw_index] * MAX_AB_VOTE;
+
+		/* Divide by the maximum bandwidth to get a quantized value */
+		gmu->gpu_ab_votes[bw_index] =
+			bcm_div(tmp, gmu->gpu_bw_table[gmu->nr_gpu_bws - 1]);
+	}
+
+	return 0;
+}
+
 /* Return the 'arc-level' for the given frequency */
 static unsigned int a6xx_gmu_get_arc_level(struct device *dev,
 					   unsigned long freq)
@@ -1390,12 +1516,15 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes,
  * The GMU votes with the RPMh for itself and on behalf of the GPU but we need
  * to construct the list of votes on the CPU and send it over. Query the RPMh
  * voltage levels and build the votes
+ * The GMU can also vote for DDR interconnects, use the OPP bandwidth entries
+ * and BCM parameters to build the votes.
  */
 
 static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
 {
 	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
 	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+	const struct a6xx_info *info = adreno_gpu->info->a6xx;
 	struct msm_gpu *gpu = &adreno_gpu->base;
 	int ret;
 
@@ -1407,6 +1536,10 @@ static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
 	ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes,
 		gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl");
 
+	/* Build the interconnect votes */
+	if (info->bcms && gmu->nr_gpu_bws > 1)
+		ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu);
+
 	return ret;
 }
 
@@ -1442,10 +1575,43 @@ static int a6xx_gmu_build_freq_table(struct device *dev, unsigned long *freqs,
 	return index;
 }
 
+static int a6xx_gmu_build_bw_table(struct device *dev, unsigned long *bandwidths,
+		u32 size)
+{
+	int count = dev_pm_opp_get_opp_count(dev);
+	struct dev_pm_opp *opp;
+	int i, index = 0;
+	unsigned int bandwidth = 1;
+
+	/*
+	 * The OPP table doesn't contain the "off" bandwidth level so we need to
+	 * add 1 to the table size to account for it
+	 */
+
+	if (WARN(count + 1 > size,
+		"The GMU bandwidth table is being truncated\n"))
+		count = size - 1;
+
+	/* Set the "off" bandwidth */
+	bandwidths[index++] = 0;
+
+	for (i = 0; i < count; i++) {
+		opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0);
+		if (IS_ERR(opp))
+			break;
+
+		dev_pm_opp_put(opp);
+		bandwidths[index++] = bandwidth++;
+	}
+
+	return index;
+}
+
 static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
 {
 	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
 	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+	const struct a6xx_info *info = adreno_gpu->info->a6xx;
 	struct msm_gpu *gpu = &adreno_gpu->base;
 
 	int ret = 0;
@@ -1472,6 +1638,14 @@ static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
 
 	gmu->current_perf_index = gmu->nr_gpu_freqs - 1;
 
+	/*
+	 * The GMU also handles GPU Interconnect Votes so build a list
+	 * of DDR bandwidths from the GPU OPP table
+	 */
+	if (info->bcms)
+		gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev,
+			gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table));
+
 	/* Build the list of RPMh votes that we'll send to the GMU */
 	return a6xx_gmu_rpmh_votes_init(gmu);
 }
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index 88f18ea6a38a08b5b171709e5020010947a5d347..bdfc106cb3a578c90d7cd84f7d4fe228d761a994 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -21,6 +21,15 @@ struct a6xx_gmu_bo {
 
 #define GMU_MAX_GX_FREQS	16
 #define GMU_MAX_CX_FREQS	4
+#define GMU_MAX_BCMS		3
+
+struct a6xx_bcm {
+	char *name;
+	unsigned int buswidth;
+	bool fixed;
+	unsigned int perfmode;
+	unsigned int perfmode_bw;
+};
 
 /*
  * These define the different GMU wake up options - these define how both the
@@ -85,6 +94,11 @@ struct a6xx_gmu {
 	unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
 	u32 gx_arc_votes[GMU_MAX_GX_FREQS];
 
+	int nr_gpu_bws;
+	unsigned long gpu_bw_table[GMU_MAX_GX_FREQS];
+	u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS];
+	u16 gpu_ab_votes[GMU_MAX_GX_FREQS];
+
 	int nr_gmu_freqs;
 	unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
 	u32 cx_arc_votes[GMU_MAX_CX_FREQS];
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -44,6 +44,7 @@ struct a6xx_info {
 	u32 gmu_chipid;
 	u32 gmu_cgc_mode;
 	u32 prim_fifo_threshold;
+	const struct a6xx_bcm *bcms;
 };
 
 struct a6xx_gpu {
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
index 528110169398f69f16443a29a1594d19c36fb595..52ba4a07d7b9a709289acd244a751ace9bdaab5d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
@@ -173,6 +173,11 @@ struct a6xx_hfi_gx_bw_perf_vote_cmd {
 	u32 bw;
 };
 
+#define AB_VOTE_MASK		GENMASK(31, 16)
+#define MAX_AB_VOTE		(FIELD_MAX(AB_VOTE_MASK) - 1)
+#define AB_VOTE(vote)		FIELD_PREP(AB_VOTE_MASK, (vote))
+#define AB_VOTE_ENABLE		BIT(8)
+
 #define HFI_H2F_MSG_PREPARE_SLUMBER 33
 
 struct a6xx_hfi_prep_slumber_cmd {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/7] drm/msm: adreno: dynamically generate GMU bw table
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
  2024-11-28 10:25 ` [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes Neil Armstrong
  2024-11-28 10:25 ` [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-29 16:56   ` Konrad Dybcio
  2024-11-28 10:25 ` [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index Neil Armstrong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

The Adreno GPU Management Unit (GMU) can also scale the ddr
bandwidth along the frequency and power domain level, but for
now we statically fill the bw_table with values from the
downstream driver.

Only the first entry is used, which is a disable vote, so we
currently rely on scaling via the linux interconnect paths.

Let's dynamically generate the bw_table with the vote values
previously calculated from the OPPs.

Those entried will then be used by the GMU when passing the
appropriate bandwidth level while voting for a gpu frequency.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 39 ++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
index cb8844ed46b29c4569d05eb7a24f7b27e173190f..fe1946650425b749bad483dad1e630bc8be83abc 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
@@ -621,6 +621,35 @@ static void a740_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
 	msg->cnoc_cmds_data[1][0] = 0x60000001;
 }
 
+static void a740_generate_bw_table(const struct a6xx_info *info, struct a6xx_gmu *gmu,
+				   struct a6xx_hfi_msg_bw_table *msg)
+{
+	unsigned int i, j;
+
+	msg->ddr_wait_bitmask = 0x7;
+
+	for (i = 0; i < GMU_MAX_BCMS; i++) {
+		if (!info->bcms[i].name)
+			break;
+		msg->ddr_cmds_addrs[i] = cmd_db_read_addr(info->bcms[i].name);
+	}
+	msg->ddr_cmds_num = i;
+
+	for (i = 0; i < gmu->nr_gpu_bws; ++i)
+		for (j = 0; j < msg->ddr_cmds_num; j++)
+			msg->ddr_cmds_data[i][j] = gmu->gpu_ib_votes[i][j];
+	msg->bw_level_num = gmu->nr_gpu_bws;
+
+	/* TODO also generate CNOC commands */
+
+	msg->cnoc_cmds_num = 1;
+	msg->cnoc_wait_bitmask = 0x1;
+
+	msg->cnoc_cmds_addrs[0] = cmd_db_read_addr("CN0");
+	msg->cnoc_cmds_data[0][0] = 0x40000000;
+	msg->cnoc_cmds_data[1][0] = 0x60000001;
+}
+
 static void a6xx_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
 {
 	/* Send a single "off" entry since the 630 GMU doesn't do bus scaling */
@@ -664,6 +693,7 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
 	struct a6xx_hfi_msg_bw_table *msg;
 	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
 	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+	const struct a6xx_info *info = adreno_gpu->info->a6xx;
 
 	if (gmu->bw_table)
 		goto send;
@@ -690,9 +720,12 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
 		a690_build_bw_table(msg);
 	else if (adreno_is_a730(adreno_gpu))
 		a730_build_bw_table(msg);
-	else if (adreno_is_a740_family(adreno_gpu))
-		a740_build_bw_table(msg);
-	else
+	else if (adreno_is_a740_family(adreno_gpu)) {
+		if (info->bcms && gmu->nr_gpu_bws > 1)
+			a740_generate_bw_table(info, gmu, msg);
+		else
+			a740_build_bw_table(msg);
+	} else
 		a6xx_build_bw_table(msg);
 
 	gmu->bw_table = msg;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
                   ` (2 preceding siblings ...)
  2024-11-28 10:25 ` [PATCH v3 3/7] drm/msm: adreno: dynamically generate GMU bw table Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-29 15:33   ` Konrad Dybcio
  2024-11-28 10:25 ` [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750 Neil Armstrong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

The Adreno GPU Management Unit (GMU) can also scale the DDR Bandwidth
along the Frequency and Power Domain level, until now we left the OPP
core scale the OPP bandwidth via the interconnect path.

In order to enable bandwidth voting via the GPU Management
Unit (GMU), when an opp is set by devfreq we also look for
the corresponding bandwidth index in the previously generated
bw_table and pass this value along the frequency index to the GMU.

The AB pre-calculated vote is appended to the bandwidth index
to inform the GMU firmware the quantity of bandwidth we need.

Since we now vote for all resources via the GMU, setting the OPP
is no more needed, so we can completely skip calling
dev_pm_opp_set_opp() in this situation.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 23 +++++++++++++++++++++--
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c |  6 +++---
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index ee2010a01186721dd377f1655fcf05ddaff77131..c09442ecc861c4e56c81e7e775b9e57baf7d2e51 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -110,9 +110,11 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
 		       bool suspended)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	const struct a6xx_info *info = adreno_gpu->info->a6xx;
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
 	u32 perf_index;
+	u32 bw_index = 0;
 	unsigned long gpu_freq;
 	int ret = 0;
 
@@ -125,6 +127,21 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
 		if (gpu_freq == gmu->gpu_freqs[perf_index])
 			break;
 
+	/* If enabled, find the corresponding DDR bandwidth index */
+	if (info->bcms && gmu->nr_gpu_bws > 1) {
+		unsigned int bw = dev_pm_opp_get_bw(opp, true, 0);
+
+		for (bw_index = 0; bw_index < gmu->nr_gpu_bws - 1; bw_index++) {
+			if (bw == gmu->gpu_bw_table[bw_index])
+				break;
+		}
+
+		if (bw_index) {
+			bw_index |= AB_VOTE(gmu->gpu_ab_votes[bw_index]);
+			bw_index |= AB_VOTE_ENABLE;
+		}
+	}
+
 	gmu->current_perf_index = perf_index;
 	gmu->freq = gmu->gpu_freqs[perf_index];
 
@@ -140,8 +157,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
 		return;
 
 	if (!gmu->legacy) {
-		a6xx_hfi_set_freq(gmu, perf_index);
-		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+		a6xx_hfi_set_freq(gmu, perf_index, bw_index);
+		/* With Bandwidth voting, we now vote for all resources, so skip OPP set */
+		if (!bw_index)
+			dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
 		return;
 	}
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index bdfc106cb3a578c90d7cd84f7d4fe228d761a994..432b16c4e198939d9bdb968df6170e4ac74fc923 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -210,7 +210,7 @@ void a6xx_hfi_init(struct a6xx_gmu *gmu);
 int a6xx_hfi_start(struct a6xx_gmu *gmu, int boot_state);
 void a6xx_hfi_stop(struct a6xx_gmu *gmu);
 int a6xx_hfi_send_prep_slumber(struct a6xx_gmu *gmu);
-int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, int index);
+int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, u32 perf_index, u32 bw_index);
 
 bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu);
 bool a6xx_gmu_sptprac_is_on(struct a6xx_gmu *gmu);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
index fe1946650425b749bad483dad1e630bc8be83abc..9f8c6f9157381a6f7b66de766a046dd84e211384 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
@@ -759,13 +759,13 @@ static int a6xx_hfi_send_core_fw_start(struct a6xx_gmu *gmu)
 		sizeof(msg), NULL, 0);
 }
 
-int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, int index)
+int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, u32 freq_index, u32 bw_index)
 {
 	struct a6xx_hfi_gx_bw_perf_vote_cmd msg = { 0 };
 
 	msg.ack_type = 1; /* blocking */
-	msg.freq = index;
-	msg.bw = 0; /* TODO: bus scaling */
+	msg.freq = freq_index;
+	msg.bw = bw_index;
 
 	return a6xx_hfi_send_msg(gmu, HFI_H2F_MSG_GX_BW_PERF_VOTE, &msg,
 		sizeof(msg), NULL, 0);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
                   ` (3 preceding siblings ...)
  2024-11-28 10:25 ` [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-28 13:30   ` Dmitry Baryshkov
  2024-11-29 15:25   ` Konrad Dybcio
  2024-11-28 10:25 ` [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU Neil Armstrong
  2024-11-28 10:25 ` [PATCH v3 7/7] arm64: qcom: dts: sm8650: " Neil Armstrong
  6 siblings, 2 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

Now all the DDR bandwidth voting via the GPU Management Unit (GMU)
is in place, declare the Bus Control Modules (BCMs) and the
corresponding parameters in the GPU info struct and add the
GMU_BW_VOTE feature bit to enable it.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 0c560e84ad5a53bb4e8a49ba4e153ce9cf33f7ae..edffb7737a97b268bb2986d557969e651988a344 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1388,6 +1388,17 @@ static const struct adreno_info a7xx_gpus[] = {
 			.pwrup_reglist = &a7xx_pwrup_reglist,
 			.gmu_chipid = 0x7020100,
 			.gmu_cgc_mode = 0x00020202,
+			.bcms = (const struct a6xx_bcm[]) {
+				{ .name = "SH0", .buswidth = 16 },
+				{ .name = "MC0", .buswidth = 4 },
+				{
+					.name = "ACV",
+					.fixed = true,
+					.perfmode = BIT(3),
+					.perfmode_bw = 16500000,
+				},
+				{ /* sentinel */ },
+			},
 		},
 		.address_space_size = SZ_16G,
 		.preempt_record_size = 4192 * SZ_1K,
@@ -1432,6 +1443,17 @@ static const struct adreno_info a7xx_gpus[] = {
 			.pwrup_reglist = &a7xx_pwrup_reglist,
 			.gmu_chipid = 0x7090100,
 			.gmu_cgc_mode = 0x00020202,
+			.bcms = (const struct a6xx_bcm[]) {
+				{ .name = "SH0", .buswidth = 16 },
+				{ .name = "MC0", .buswidth = 4 },
+				{
+					.name = "ACV",
+					.fixed = true,
+					.perfmode = BIT(2),
+					.perfmode_bw = 10687500,
+				},
+				{ /* sentinel */ },
+			},
 		},
 		.address_space_size = SZ_16G,
 		.preempt_record_size = 3572 * SZ_1K,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
                   ` (4 preceding siblings ...)
  2024-11-28 10:25 ` [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750 Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-28 13:26   ` Dmitry Baryshkov
  2024-11-28 10:25 ` [PATCH v3 7/7] arm64: qcom: dts: sm8650: " Neil Armstrong
  6 siblings, 1 reply; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

Each GPU OPP requires a specific peak DDR bandwidth, let's add
those to each OPP and also the related interconnect path.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 arch/arm64/boot/dts/qcom/sm8550.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sm8550.dtsi b/arch/arm64/boot/dts/qcom/sm8550.dtsi
index e7774d32fb6d2288748ecec00bf525b2b3c40fbb..545eb52174c704bbefa69189fad9fbff053d8569 100644
--- a/arch/arm64/boot/dts/qcom/sm8550.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8550.dtsi
@@ -2114,6 +2114,9 @@ gpu: gpu@3d00000 {
 			qcom,gmu = <&gmu>;
 			#cooling-cells = <2>;
 
+			interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt SLAVE_EBI1 0>;
+			interconnect-names = "gfx-mem";
+
 			status = "disabled";
 
 			zap-shader {
@@ -2127,41 +2130,49 @@ gpu_opp_table: opp-table {
 				opp-680000000 {
 					opp-hz = /bits/ 64 <680000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L1>;
+					opp-peak-kBps = <16500000>;
 				};
 
 				opp-615000000 {
 					opp-hz = /bits/ 64 <615000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L0>;
+					opp-peak-kBps = <16500000>;
 				};
 
 				opp-550000000 {
 					opp-hz = /bits/ 64 <550000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS>;
+					opp-peak-kBps = <12449218>;
 				};
 
 				opp-475000000 {
 					opp-hz = /bits/ 64 <475000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_L1>;
+					opp-peak-kBps = <8171875>;
 				};
 
 				opp-401000000 {
 					opp-hz = /bits/ 64 <401000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS>;
+					opp-peak-kBps = <6671875>;
 				};
 
 				opp-348000000 {
 					opp-hz = /bits/ 64 <348000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D0>;
+					opp-peak-kBps = <6074218>;
 				};
 
 				opp-295000000 {
 					opp-hz = /bits/ 64 <295000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D1>;
+					opp-peak-kBps = <6074218>;
 				};
 
 				opp-220000000 {
 					opp-hz = /bits/ 64 <220000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D2>;
+					opp-peak-kBps = <6074218>;
 				};
 			};
 		};

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 7/7] arm64: qcom: dts: sm8650: add interconnect and opp-peak-kBps for GPU
  2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
                   ` (5 preceding siblings ...)
  2024-11-28 10:25 ` [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU Neil Armstrong
@ 2024-11-28 10:25 ` Neil Armstrong
  2024-11-28 13:26   ` Dmitry Baryshkov
  6 siblings, 1 reply; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree,
	Neil Armstrong

Each GPU OPP requires a specific peak DDR bandwidth, let's add
those to each OPP and also the related interconnect path.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 arch/arm64/boot/dts/qcom/sm8650.dtsi | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
index 25e47505adcb790d09f1d2726386438487255824..dc85ba8fe1d8f20981b6d7e9672fd7137b915b98 100644
--- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
@@ -2636,6 +2636,9 @@ gpu: gpu@3d00000 {
 			qcom,gmu = <&gmu>;
 			#cooling-cells = <2>;
 
+			interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt SLAVE_EBI1 0>;
+			interconnect-names = "gfx-mem";
+
 			status = "disabled";
 
 			zap-shader {
@@ -2649,56 +2652,67 @@ gpu_opp_table: opp-table {
 				opp-231000000 {
 					opp-hz = /bits/ 64 <231000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D2>;
+					opp-peak-kBps = <2136718>;
 				};
 
 				opp-310000000 {
 					opp-hz = /bits/ 64 <310000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D1>;
+					opp-peak-kBps = <6074218>;
 				};
 
 				opp-366000000 {
 					opp-hz = /bits/ 64 <366000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D0>;
+					opp-peak-kBps = <6074218>;
 				};
 
 				opp-422000000 {
 					opp-hz = /bits/ 64 <422000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS>;
+					opp-peak-kBps = <8171875>;
 				};
 
 				opp-500000000 {
 					opp-hz = /bits/ 64 <500000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_L1>;
+					opp-peak-kBps = <8171875>;
 				};
 
 				opp-578000000 {
 					opp-hz = /bits/ 64 <578000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS>;
+					opp-peak-kBps = <12449218>;
 				};
 
 				opp-629000000 {
 					opp-hz = /bits/ 64 <629000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L0>;
+					opp-peak-kBps = <12449218>;
 				};
 
 				opp-680000000 {
 					opp-hz = /bits/ 64 <680000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L1>;
+					opp-peak-kBps = <16500000>;
 				};
 
 				opp-720000000 {
 					opp-hz = /bits/ 64 <720000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L2>;
+					opp-peak-kBps = <16500000>;
 				};
 
 				opp-770000000 {
 					opp-hz = /bits/ 64 <770000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_NOM>;
+					opp-peak-kBps = <16500000>;
 				};
 
 				opp-834000000 {
 					opp-hz = /bits/ 64 <834000000>;
 					opp-level = <RPMH_REGULATOR_LEVEL_NOM_L1>;
+					opp-peak-kBps = <16500000>;
 				};
 			};
 		};

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes
  2024-11-28 10:25 ` [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes Neil Armstrong
@ 2024-11-28 13:24   ` Dmitry Baryshkov
  2024-11-30 20:39   ` Akhil P Oommen
  1 sibling, 0 replies; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-11-28 13:24 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Marijn Suijten, David Airlie, Simona Vetter, Bjorn Andersson,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Akhil P Oommen,
	linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On Thu, Nov 28, 2024 at 11:25:41AM +0100, Neil Armstrong wrote:
> Even if the code uses ARRAY_SIZE() to fill those tables,
> it's still a best practice to not use magic values for
> tables in structs.
> 
> Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU
  2024-11-28 10:25 ` [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU Neil Armstrong
@ 2024-11-28 13:26   ` Dmitry Baryshkov
  2024-11-28 14:16     ` Neil Armstrong
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-11-28 13:26 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Marijn Suijten, David Airlie, Simona Vetter, Bjorn Andersson,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Akhil P Oommen,
	linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On Thu, Nov 28, 2024 at 11:25:46AM +0100, Neil Armstrong wrote:
> Each GPU OPP requires a specific peak DDR bandwidth, let's add
> those to each OPP and also the related interconnect path.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sm8550.dtsi | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sm8550.dtsi b/arch/arm64/boot/dts/qcom/sm8550.dtsi
> index e7774d32fb6d2288748ecec00bf525b2b3c40fbb..545eb52174c704bbefa69189fad9fbff053d8569 100644
> --- a/arch/arm64/boot/dts/qcom/sm8550.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sm8550.dtsi
> @@ -2114,6 +2114,9 @@ gpu: gpu@3d00000 {
>  			qcom,gmu = <&gmu>;
>  			#cooling-cells = <2>;
>  
> +			interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt SLAVE_EBI1 0>;

QCOM_ICC_TAG_ALWAYS

LGTM otherwise.

> +			interconnect-names = "gfx-mem";
> +
>  			status = "disabled";
>  
>  			zap-shader {
> @@ -2127,41 +2130,49 @@ gpu_opp_table: opp-table {
>  				opp-680000000 {
>  					opp-hz = /bits/ 64 <680000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L1>;
> +					opp-peak-kBps = <16500000>;
>  				};
>  
>  				opp-615000000 {
>  					opp-hz = /bits/ 64 <615000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L0>;
> +					opp-peak-kBps = <16500000>;
>  				};
>  
>  				opp-550000000 {
>  					opp-hz = /bits/ 64 <550000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS>;
> +					opp-peak-kBps = <12449218>;
>  				};
>  
>  				opp-475000000 {
>  					opp-hz = /bits/ 64 <475000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_L1>;
> +					opp-peak-kBps = <8171875>;
>  				};
>  
>  				opp-401000000 {
>  					opp-hz = /bits/ 64 <401000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS>;
> +					opp-peak-kBps = <6671875>;
>  				};
>  
>  				opp-348000000 {
>  					opp-hz = /bits/ 64 <348000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D0>;
> +					opp-peak-kBps = <6074218>;
>  				};
>  
>  				opp-295000000 {
>  					opp-hz = /bits/ 64 <295000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D1>;
> +					opp-peak-kBps = <6074218>;
>  				};
>  
>  				opp-220000000 {
>  					opp-hz = /bits/ 64 <220000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D2>;
> +					opp-peak-kBps = <6074218>;
>  				};
>  			};
>  		};
> 
> -- 
> 2.34.1
> 

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 7/7] arm64: qcom: dts: sm8650: add interconnect and opp-peak-kBps for GPU
  2024-11-28 10:25 ` [PATCH v3 7/7] arm64: qcom: dts: sm8650: " Neil Armstrong
@ 2024-11-28 13:26   ` Dmitry Baryshkov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-11-28 13:26 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Marijn Suijten, David Airlie, Simona Vetter, Bjorn Andersson,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Akhil P Oommen,
	linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On Thu, Nov 28, 2024 at 11:25:47AM +0100, Neil Armstrong wrote:
> Each GPU OPP requires a specific peak DDR bandwidth, let's add
> those to each OPP and also the related interconnect path.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sm8650.dtsi | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
> index 25e47505adcb790d09f1d2726386438487255824..dc85ba8fe1d8f20981b6d7e9672fd7137b915b98 100644
> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
> @@ -2636,6 +2636,9 @@ gpu: gpu@3d00000 {
>  			qcom,gmu = <&gmu>;
>  			#cooling-cells = <2>;
>  
> +			interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt SLAVE_EBI1 0>;

QCOM_ICC_TAG_ALWAYS, LGTM otherwise

> +			interconnect-names = "gfx-mem";
> +
>  			status = "disabled";
>  
>  			zap-shader {
> @@ -2649,56 +2652,67 @@ gpu_opp_table: opp-table {
>  				opp-231000000 {
>  					opp-hz = /bits/ 64 <231000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D2>;
> +					opp-peak-kBps = <2136718>;
>  				};
>  
>  				opp-310000000 {
>  					opp-hz = /bits/ 64 <310000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D1>;
> +					opp-peak-kBps = <6074218>;
>  				};
>  
>  				opp-366000000 {
>  					opp-hz = /bits/ 64 <366000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D0>;
> +					opp-peak-kBps = <6074218>;
>  				};
>  
>  				opp-422000000 {
>  					opp-hz = /bits/ 64 <422000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS>;
> +					opp-peak-kBps = <8171875>;
>  				};
>  
>  				opp-500000000 {
>  					opp-hz = /bits/ 64 <500000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_L1>;
> +					opp-peak-kBps = <8171875>;
>  				};
>  
>  				opp-578000000 {
>  					opp-hz = /bits/ 64 <578000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS>;
> +					opp-peak-kBps = <12449218>;
>  				};
>  
>  				opp-629000000 {
>  					opp-hz = /bits/ 64 <629000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L0>;
> +					opp-peak-kBps = <12449218>;
>  				};
>  
>  				opp-680000000 {
>  					opp-hz = /bits/ 64 <680000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L1>;
> +					opp-peak-kBps = <16500000>;
>  				};
>  
>  				opp-720000000 {
>  					opp-hz = /bits/ 64 <720000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L2>;
> +					opp-peak-kBps = <16500000>;
>  				};
>  
>  				opp-770000000 {
>  					opp-hz = /bits/ 64 <770000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_NOM>;
> +					opp-peak-kBps = <16500000>;
>  				};
>  
>  				opp-834000000 {
>  					opp-hz = /bits/ 64 <834000000>;
>  					opp-level = <RPMH_REGULATOR_LEVEL_NOM_L1>;
> +					opp-peak-kBps = <16500000>;
>  				};
>  			};
>  		};
> 
> -- 
> 2.34.1
> 

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750
  2024-11-28 10:25 ` [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750 Neil Armstrong
@ 2024-11-28 13:30   ` Dmitry Baryshkov
  2024-11-29 15:25   ` Konrad Dybcio
  1 sibling, 0 replies; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-11-28 13:30 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Marijn Suijten, David Airlie, Simona Vetter, Bjorn Andersson,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Akhil P Oommen,
	linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On Thu, Nov 28, 2024 at 11:25:45AM +0100, Neil Armstrong wrote:
> Now all the DDR bandwidth voting via the GPU Management Unit (GMU)
> is in place, declare the Bus Control Modules (BCMs) and the
> corresponding parameters in the GPU info struct and add the
> GMU_BW_VOTE feature bit to enable it.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU
  2024-11-28 13:26   ` Dmitry Baryshkov
@ 2024-11-28 14:16     ` Neil Armstrong
  0 siblings, 0 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-11-28 14:16 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Marijn Suijten, David Airlie, Simona Vetter, Bjorn Andersson,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Akhil P Oommen,
	linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 28/11/2024 14:26, Dmitry Baryshkov wrote:
> On Thu, Nov 28, 2024 at 11:25:46AM +0100, Neil Armstrong wrote:
>> Each GPU OPP requires a specific peak DDR bandwidth, let's add
>> those to each OPP and also the related interconnect path.
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   arch/arm64/boot/dts/qcom/sm8550.dtsi | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/qcom/sm8550.dtsi b/arch/arm64/boot/dts/qcom/sm8550.dtsi
>> index e7774d32fb6d2288748ecec00bf525b2b3c40fbb..545eb52174c704bbefa69189fad9fbff053d8569 100644
>> --- a/arch/arm64/boot/dts/qcom/sm8550.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/sm8550.dtsi
>> @@ -2114,6 +2114,9 @@ gpu: gpu@3d00000 {
>>   			qcom,gmu = <&gmu>;
>>   			#cooling-cells = <2>;
>>   
>> +			interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt SLAVE_EBI1 0>;
> 
> QCOM_ICC_TAG_ALWAYS

Damn, thanks forgot those one.

Thanks,
Neil

> 
> LGTM otherwise.
> 
>> +			interconnect-names = "gfx-mem";
>> +
>>   			status = "disabled";
>>   
>>   			zap-shader {
>> @@ -2127,41 +2130,49 @@ gpu_opp_table: opp-table {
>>   				opp-680000000 {
>>   					opp-hz = /bits/ 64 <680000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L1>;
>> +					opp-peak-kBps = <16500000>;
>>   				};
>>   
>>   				opp-615000000 {
>>   					opp-hz = /bits/ 64 <615000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_SVS_L0>;
>> +					opp-peak-kBps = <16500000>;
>>   				};
>>   
>>   				opp-550000000 {
>>   					opp-hz = /bits/ 64 <550000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_SVS>;
>> +					opp-peak-kBps = <12449218>;
>>   				};
>>   
>>   				opp-475000000 {
>>   					opp-hz = /bits/ 64 <475000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_L1>;
>> +					opp-peak-kBps = <8171875>;
>>   				};
>>   
>>   				opp-401000000 {
>>   					opp-hz = /bits/ 64 <401000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS>;
>> +					opp-peak-kBps = <6671875>;
>>   				};
>>   
>>   				opp-348000000 {
>>   					opp-hz = /bits/ 64 <348000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D0>;
>> +					opp-peak-kBps = <6074218>;
>>   				};
>>   
>>   				opp-295000000 {
>>   					opp-hz = /bits/ 64 <295000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D1>;
>> +					opp-peak-kBps = <6074218>;
>>   				};
>>   
>>   				opp-220000000 {
>>   					opp-hz = /bits/ 64 <220000000>;
>>   					opp-level = <RPMH_REGULATOR_LEVEL_LOW_SVS_D2>;
>> +					opp-peak-kBps = <6074218>;
>>   				};
>>   			};
>>   		};
>>
>> -- 
>> 2.34.1
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-11-28 10:25 ` [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU Neil Armstrong
@ 2024-11-29 15:21   ` Konrad Dybcio
  2024-12-02  8:41     ` Neil Armstrong
  2024-11-30 21:49   ` Akhil P Oommen
  1 sibling, 1 reply; 26+ messages in thread
From: Konrad Dybcio @ 2024-11-29 15:21 UTC (permalink / raw)
  To: Neil Armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 28.11.2024 11:25 AM, Neil Armstrong wrote:
> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
> the Frequency and Power Domain level, but by default we leave the
> OPP core scale the interconnect ddr path.
> 
> While scaling via the interconnect path was sufficient, newer GPUs
> like the A750 requires specific vote paremeters and bandwidth to
> achieve full functionality.
> 
> In order to calculate vote values used by the GPU Management
> Unit (GMU), we need to parse all the possible OPP Bandwidths and
> create a vote value to be sent to the appropriate Bus Control
> Modules (BCMs) declared in the GPU info struct.
> 
> This vote value is called IB, while on the othe side the GMU also
> takes another vote called AB which is a 16bit quantized value
> of the bandwidth against the maximum supported bandwidth.
> 
> The vote array will then be used to dynamically generate the GMU
> bw_table sent during the GMU power-up.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>  4 files changed, 194 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -9,6 +9,7 @@
>  #include <linux/pm_domain.h>
>  #include <linux/pm_opp.h>
>  #include <soc/qcom/cmd-db.h>
> +#include <soc/qcom/tcs.h>
>  #include <drm/drm_gem.h>
>  
>  #include "a6xx_gpu.h"
> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>  	return 0;
>  }
>  
> +/**
> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
> + * @vcd: virtual clock domain that this bcm belongs to
> + * @reserved: reserved field
> + */
> +struct bcm_db {
> +	__le32 unit;
> +	__le16 width;
> +	u8 vcd;
> +	u8 reserved;
> +};
> +
> +static u64 bcm_div(u64 num, u32 base)
> +{
> +	/* Ensure that small votes aren't lost. */
> +	if (num && num < base)
> +		return 1;
> +
> +	do_div(num, base);
> +
> +	return num;
> +}

This should live in include/soc/qcom/bcm.h, similarly to tcs.h in
that directory

> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
> +				       struct a6xx_gmu *gmu)
> +{
> +	const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
> +	unsigned int bcm_index, bw_index, bcm_count = 0;
> +
> +	if (!info->bcms)
> +		return 0;
> +
> +	/* Retrieve BCM data from cmd-db */
> +	for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
> +		size_t count;
> +
> +		/* Stop at first unconfigured bcm */
> +		if (!info->bcms[bcm_index].name)
> +			break;
> +
> +		bcm_data[bcm_index] = cmd_db_read_aux_data(
> +						info->bcms[bcm_index].name,
> +						&count);
> +		if (IS_ERR(bcm_data[bcm_index]))
> +			return PTR_ERR(bcm_data[bcm_index]);
> +
> +		if (!count)
> +			return -EINVAL;
> +
> +		++bcm_count;
> +	}
> +
> +	/* Generate BCM votes values for each bandwidth & BCM */
> +	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
> +		u32 *data = gmu->gpu_ib_votes[bw_index];
> +		u32 bw = gmu->gpu_bw_table[bw_index];
> +
> +		/* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */

Ditto, perhaps this should be exported from icc

[...]

Konrad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750
  2024-11-28 10:25 ` [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750 Neil Armstrong
  2024-11-28 13:30   ` Dmitry Baryshkov
@ 2024-11-29 15:25   ` Konrad Dybcio
  2024-12-02  8:47     ` Neil Armstrong
  1 sibling, 1 reply; 26+ messages in thread
From: Konrad Dybcio @ 2024-11-29 15:25 UTC (permalink / raw)
  To: Neil Armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 28.11.2024 11:25 AM, Neil Armstrong wrote:
> Now all the DDR bandwidth voting via the GPU Management Unit (GMU)
> is in place, declare the Bus Control Modules (BCMs) and the
> corresponding parameters in the GPU info struct and add the
> GMU_BW_VOTE feature bit to enable it.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> index 0c560e84ad5a53bb4e8a49ba4e153ce9cf33f7ae..edffb7737a97b268bb2986d557969e651988a344 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> @@ -1388,6 +1388,17 @@ static const struct adreno_info a7xx_gpus[] = {
>  			.pwrup_reglist = &a7xx_pwrup_reglist,
>  			.gmu_chipid = 0x7020100,
>  			.gmu_cgc_mode = 0x00020202,
> +			.bcms = (const struct a6xx_bcm[]) {
> +				{ .name = "SH0", .buswidth = 16 },
> +				{ .name = "MC0", .buswidth = 4 },
> +				{
> +					.name = "ACV",
> +					.fixed = true,
> +					.perfmode = BIT(3),
> +					.perfmode_bw = 16500000,
> +				},
> +				{ /* sentinel */ },
> +			},

This is not going to fly the second there's two SoCs implementing the
same GPU with a difference in bus topology. I think we could add
something like drvdata to ICC nodes and use it for BCMs on icc-rpmh.
Then, we could retrieve it from the interconnect path we get from the
dt node. It would also reduce duplication.

Konrad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index
  2024-11-28 10:25 ` [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index Neil Armstrong
@ 2024-11-29 15:33   ` Konrad Dybcio
  2024-11-30 22:02     ` Akhil P Oommen
  0 siblings, 1 reply; 26+ messages in thread
From: Konrad Dybcio @ 2024-11-29 15:33 UTC (permalink / raw)
  To: Neil Armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 28.11.2024 11:25 AM, Neil Armstrong wrote:
> The Adreno GPU Management Unit (GMU) can also scale the DDR Bandwidth
> along the Frequency and Power Domain level, until now we left the OPP
> core scale the OPP bandwidth via the interconnect path.
> 
> In order to enable bandwidth voting via the GPU Management
> Unit (GMU), when an opp is set by devfreq we also look for
> the corresponding bandwidth index in the previously generated
> bw_table and pass this value along the frequency index to the GMU.
> 
> The AB pre-calculated vote is appended to the bandwidth index
> to inform the GMU firmware the quantity of bandwidth we need.
> 
> Since we now vote for all resources via the GMU, setting the OPP
> is no more needed, so we can completely skip calling
> dev_pm_opp_set_opp() in this situation.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 23 +++++++++++++++++++++--
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_hfi.c |  6 +++---
>  3 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index ee2010a01186721dd377f1655fcf05ddaff77131..c09442ecc861c4e56c81e7e775b9e57baf7d2e51 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -110,9 +110,11 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>  		       bool suspended)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>  	u32 perf_index;
> +	u32 bw_index = 0;
>  	unsigned long gpu_freq;
>  	int ret = 0;
>  
> @@ -125,6 +127,21 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>  		if (gpu_freq == gmu->gpu_freqs[perf_index])
>  			break;
>  
> +	/* If enabled, find the corresponding DDR bandwidth index */
> +	if (info->bcms && gmu->nr_gpu_bws > 1) {
> +		unsigned int bw = dev_pm_opp_get_bw(opp, true, 0);
> +
> +		for (bw_index = 0; bw_index < gmu->nr_gpu_bws - 1; bw_index++) {
> +			if (bw == gmu->gpu_bw_table[bw_index])
> +				break;
> +		}
> +
> +		if (bw_index) {
> +			bw_index |= AB_VOTE(gmu->gpu_ab_votes[bw_index]);
> +			bw_index |= AB_VOTE_ENABLE;
> +		}
> +	}

If we couple frequency levels with bw levels (i.e. duplicate the highest
bandwidth a couple times), we can drop all this search logic..

> +
>  	gmu->current_perf_index = perf_index;
>  	gmu->freq = gmu->gpu_freqs[perf_index];
>  
> @@ -140,8 +157,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>  		return;
>  
>  	if (!gmu->legacy) {
> -		a6xx_hfi_set_freq(gmu, perf_index);
> -		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> +		a6xx_hfi_set_freq(gmu, perf_index, bw_index);
> +		/* With Bandwidth voting, we now vote for all resources, so skip OPP set */
> +		if (!bw_index)
> +			dev_pm_opp_set_opp(&gpu->pdev->dev, opp);

..and then it would come down to..

if (!info->bcms)

Konrad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/7] drm/msm: adreno: dynamically generate GMU bw table
  2024-11-28 10:25 ` [PATCH v3 3/7] drm/msm: adreno: dynamically generate GMU bw table Neil Armstrong
@ 2024-11-29 16:56   ` Konrad Dybcio
  0 siblings, 0 replies; 26+ messages in thread
From: Konrad Dybcio @ 2024-11-29 16:56 UTC (permalink / raw)
  To: Neil Armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 28.11.2024 11:25 AM, Neil Armstrong wrote:
> The Adreno GPU Management Unit (GMU) can also scale the ddr
> bandwidth along the frequency and power domain level, but for
> now we statically fill the bw_table with values from the
> downstream driver.
> 
> Only the first entry is used, which is a disable vote, so we
> currently rely on scaling via the linux interconnect paths.
> 
> Let's dynamically generate the bw_table with the vote values
> previously calculated from the OPPs.
> 
> Those entried will then be used by the GMU when passing the

entries

> appropriate bandwidth level while voting for a gpu frequency.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---

[...]

>  drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 39 ++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> index cb8844ed46b29c4569d05eb7a24f7b27e173190f..fe1946650425b749bad483dad1e630bc8be83abc 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> @@ -621,6 +621,35 @@ static void a740_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
>  	msg->cnoc_cmds_data[1][0] = 0x60000001;
>  }
>  
> +static void a740_generate_bw_table(const struct a6xx_info *info, struct a6xx_gmu *gmu,
> +				   struct a6xx_hfi_msg_bw_table *msg)

This should work for all targets

> +{
> +	unsigned int i, j;
> +
> +	msg->ddr_wait_bitmask = 0x7;

GENMASK; also should be generated based on BCM data dynamically, there's
logic for it in bcm-voter.c : tcs_list_gen()

> +
> +	for (i = 0; i < GMU_MAX_BCMS; i++) {
> +		if (!info->bcms[i].name)
> +			break;
> +		msg->ddr_cmds_addrs[i] = cmd_db_read_addr(info->bcms[i].name);

A7xx share a common list of BCMs, the buswidth may differ per soc and it's
something already stored in ICC drivers

> +	}
> +	msg->ddr_cmds_num = i;
> +
> +	for (i = 0; i < gmu->nr_gpu_bws; ++i)
> +		for (j = 0; j < msg->ddr_cmds_num; j++)
> +			msg->ddr_cmds_data[i][j] = gmu->gpu_ib_votes[i][j];
> +	msg->bw_level_num = gmu->nr_gpu_bws;
> +
> +	/* TODO also generate CNOC commands */

We only do on/off (0/100 units - kbps?), it seems

> +
> +	msg->cnoc_cmds_num = 1;
> +	msg->cnoc_wait_bitmask = 0x1;
> +
> +	msg->cnoc_cmds_addrs[0] = cmd_db_read_addr("CN0");
> +	msg->cnoc_cmds_data[0][0] = 0x40000000;
> +	msg->cnoc_cmds_data[1][0] = 0x60000001;
> +}
> +
>  static void a6xx_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
>  {
>  	/* Send a single "off" entry since the 630 GMU doesn't do bus scaling */
> @@ -664,6 +693,7 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
>  	struct a6xx_hfi_msg_bw_table *msg;
>  	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>  
>  	if (gmu->bw_table)
>  		goto send;
> @@ -690,9 +720,12 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
>  		a690_build_bw_table(msg);
>  	else if (adreno_is_a730(adreno_gpu))
>  		a730_build_bw_table(msg);
> -	else if (adreno_is_a740_family(adreno_gpu))
> -		a740_build_bw_table(msg);
> -	else
> +	else if (adreno_is_a740_family(adreno_gpu)) {
> +		if (info->bcms && gmu->nr_gpu_bws > 1)
> +			a740_generate_bw_table(info, gmu, msg);

This if should come before the hardcoded if-else chain, as it
applies to all platforms

Konrad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes
  2024-11-28 10:25 ` [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes Neil Armstrong
  2024-11-28 13:24   ` Dmitry Baryshkov
@ 2024-11-30 20:39   ` Akhil P Oommen
  1 sibling, 0 replies; 26+ messages in thread
From: Akhil P Oommen @ 2024-11-30 20:39 UTC (permalink / raw)
  To: Neil Armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 11/28/2024 3:55 PM, Neil Armstrong wrote:
> Even if the code uses ARRAY_SIZE() to fill those tables,
> it's still a best practice to not use magic values for
> tables in structs.
> 
> Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> index b4a79f88ccf45cfe651c86d2a9da39541c5772b3..88f18ea6a38a08b5b171709e5020010947a5d347 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> @@ -19,6 +19,9 @@ struct a6xx_gmu_bo {
>  	u64 iova;
>  };
>  
> +#define GMU_MAX_GX_FREQS	16
> +#define GMU_MAX_CX_FREQS	4
> +
>  /*
>   * These define the different GMU wake up options - these define how both the
>   * CPU and the GMU bring up the hardware
> @@ -79,12 +82,12 @@ struct a6xx_gmu {
>  	int current_perf_index;
>  
>  	int nr_gpu_freqs;
> -	unsigned long gpu_freqs[16];
> -	u32 gx_arc_votes[16];
> +	unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
> +	u32 gx_arc_votes[GMU_MAX_GX_FREQS];
>  
>  	int nr_gmu_freqs;
> -	unsigned long gmu_freqs[4];
> -	u32 cx_arc_votes[4];
> +	unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
> +	u32 cx_arc_votes[GMU_MAX_CX_FREQS];
>  
>  	unsigned long freq;
>  
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-11-28 10:25 ` [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU Neil Armstrong
  2024-11-29 15:21   ` Konrad Dybcio
@ 2024-11-30 21:49   ` Akhil P Oommen
  2024-12-02  8:46     ` Neil Armstrong
  1 sibling, 1 reply; 26+ messages in thread
From: Akhil P Oommen @ 2024-11-30 21:49 UTC (permalink / raw)
  To: Neil Armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 11/28/2024 3:55 PM, Neil Armstrong wrote:
> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
> the Frequency and Power Domain level, but by default we leave the
> OPP core scale the interconnect ddr path.
> 
> While scaling via the interconnect path was sufficient, newer GPUs
> like the A750 requires specific vote paremeters and bandwidth to
> achieve full functionality.
> 
> In order to calculate vote values used by the GPU Management
> Unit (GMU), we need to parse all the possible OPP Bandwidths and
> create a vote value to be sent to the appropriate Bus Control
> Modules (BCMs) declared in the GPU info struct.
> 
> This vote value is called IB, while on the othe side the GMU also
> takes another vote called AB which is a 16bit quantized value
> of the bandwidth against the maximum supported bandwidth.
> 
> The vote array will then be used to dynamically generate the GMU
> bw_table sent during the GMU power-up.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>  4 files changed, 194 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -9,6 +9,7 @@
>  #include <linux/pm_domain.h>
>  #include <linux/pm_opp.h>
>  #include <soc/qcom/cmd-db.h>
> +#include <soc/qcom/tcs.h>
>  #include <drm/drm_gem.h>
>  
>  #include "a6xx_gpu.h"
> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>  	return 0;
>  }
>  
> +/**
> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
> + * @vcd: virtual clock domain that this bcm belongs to
> + * @reserved: reserved field
> + */
> +struct bcm_db {
> +	__le32 unit;
> +	__le16 width;
> +	u8 vcd;
> +	u8 reserved;
> +};
> +
> +static u64 bcm_div(u64 num, u32 base)
> +{
> +	/* Ensure that small votes aren't lost. */
> +	if (num && num < base)
> +		return 1;
> +
> +	do_div(num, base);
> +
> +	return num;
> +}
> +
> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
> +				       struct a6xx_gmu *gmu)
> +{
> +	const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
> +	unsigned int bcm_index, bw_index, bcm_count = 0;
> +
> +	if (!info->bcms)
> +		return 0;
> +
> +	/* Retrieve BCM data from cmd-db */
> +	for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
> +		size_t count;
> +
> +		/* Stop at first unconfigured bcm */
> +		if (!info->bcms[bcm_index].name)
> +			break;
> +
> +		bcm_data[bcm_index] = cmd_db_read_aux_data(
> +						info->bcms[bcm_index].name,
> +						&count);
> +		if (IS_ERR(bcm_data[bcm_index]))
> +			return PTR_ERR(bcm_data[bcm_index]);
> +
> +		if (!count)
> +			return -EINVAL;
> +
> +		++bcm_count;
> +	}
> +
> +	/* Generate BCM votes values for each bandwidth & BCM */
> +	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
> +		u32 *data = gmu->gpu_ib_votes[bw_index];
> +		u32 bw = gmu->gpu_bw_table[bw_index];
> +
> +		/* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */
> +		for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
> +			bool commit = false;
> +			u64 peak, vote;
> +			u16 width;
> +			u32 unit;
> +
> +			/* Skip unconfigured BCM */
> +			if (!bcm_data[bcm_index])
> +				continue;
> +
> +			if (bcm_index == bcm_count - 1 ||
> +			    (bcm_data[bcm_index + 1] &&
> +			     bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd))
> +				commit = true;
> +
> +			if (!bw) {
> +				data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
> +				continue;
> +			}
> +
> +			if (info->bcms[bcm_index].fixed) {
> +				u32 perfmode = 0;
> +
> +				if (bw >= info->bcms[bcm_index].perfmode_bw)
> +					perfmode = info->bcms[bcm_index].perfmode;
> +
> +				data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode);
> +				continue;
> +			}
> +
> +			/* Multiply the bandwidth by the width of the connection */
> +			width = le16_to_cpu(bcm_data[bcm_index]->width);
> +			peak = bcm_div((u64)bw * width, info->bcms[bcm_index].buswidth);
> +
> +			/* Input bandwidth value is in KBps, scale the value to BCM unit */
> +			unit = le32_to_cpu(bcm_data[bcm_index]->unit);
> +			vote = bcm_div(peak * 1000ULL, unit);
> +
> +			if (vote > BCM_TCS_CMD_VOTE_MASK)
> +				vote = BCM_TCS_CMD_VOTE_MASK;

use clamp()?

> +
> +			data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
> +		}
> +	}
> +
> +	/* Generate AB votes which are a quantitized bandwidth value */
> +	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
> +		u64 tmp;
> +
> +		/*
> +		 * The AB vote consists of a 16 bit wide quantized level
> +		 * against the maximum supported bandwidth.
> +		 * Quantization can be calculated as below:
> +		 * vote = (bandwidth * 2^16) / max bandwidth
> +		 */
> +		tmp = gmu->gpu_bw_table[bw_index] * MAX_AB_VOTE;
> +
> +		/* Divide by the maximum bandwidth to get a quantized value */
> +		gmu->gpu_ab_votes[bw_index] =
> +			bcm_div(tmp, gmu->gpu_bw_table[gmu->nr_gpu_bws - 1]);
> +	}

So I suppose you are trying to vote AB equal to IB. Aggregation logic
for both are different. So this will make DDR scale very aggressively. A
more reasonable approach would be to vote a % of IB vote (25%?). Ideally
we should measure GPU's bandwidth usage and vote that if you are really
concerned about stability issues. IB is just a floor vote, GPU can
generate way higher traffic.

> +
> +	return 0;
> +}
> +
>  /* Return the 'arc-level' for the given frequency */
>  static unsigned int a6xx_gmu_get_arc_level(struct device *dev,
>  					   unsigned long freq)
> @@ -1390,12 +1516,15 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes,
>   * The GMU votes with the RPMh for itself and on behalf of the GPU but we need
>   * to construct the list of votes on the CPU and send it over. Query the RPMh
>   * voltage levels and build the votes
> + * The GMU can also vote for DDR interconnects, use the OPP bandwidth entries
> + * and BCM parameters to build the votes.
>   */
>  
>  static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>  {
>  	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>  	struct msm_gpu *gpu = &adreno_gpu->base;
>  	int ret;
>  
> @@ -1407,6 +1536,10 @@ static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>  	ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes,
>  		gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl");
>  
> +	/* Build the interconnect votes */
> +	if (info->bcms && gmu->nr_gpu_bws > 1)
> +		ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu);
> +
>  	return ret;
>  }
>  
> @@ -1442,10 +1575,43 @@ static int a6xx_gmu_build_freq_table(struct device *dev, unsigned long *freqs,
>  	return index;
>  }
>  
> +static int a6xx_gmu_build_bw_table(struct device *dev, unsigned long *bandwidths,
> +		u32 size)
> +{
> +	int count = dev_pm_opp_get_opp_count(dev);

I am less concerned about this now since you are not voting real AB BW.

-Akhil.

> +	struct dev_pm_opp *opp;
> +	int i, index = 0;
> +	unsigned int bandwidth = 1;
> +
> +	/*
> +	 * The OPP table doesn't contain the "off" bandwidth level so we need to
> +	 * add 1 to the table size to account for it
> +	 */
> +
> +	if (WARN(count + 1 > size,
> +		"The GMU bandwidth table is being truncated\n"))
> +		count = size - 1;
> +
> +	/* Set the "off" bandwidth */
> +	bandwidths[index++] = 0;
> +
> +	for (i = 0; i < count; i++) {
> +		opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0);
> +		if (IS_ERR(opp))
> +			break;
> +
> +		dev_pm_opp_put(opp);
> +		bandwidths[index++] = bandwidth++;
> +	}
> +
> +	return index;
> +}
> +
>  static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>  {
>  	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>  	struct msm_gpu *gpu = &adreno_gpu->base;
>  
>  	int ret = 0;
> @@ -1472,6 +1638,14 @@ static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>  
>  	gmu->current_perf_index = gmu->nr_gpu_freqs - 1;
>  
> +	/*
> +	 * The GMU also handles GPU Interconnect Votes so build a list
> +	 * of DDR bandwidths from the GPU OPP table
> +	 */
> +	if (info->bcms)
> +		gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev,
> +			gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table));
> +
>  	/* Build the list of RPMh votes that we'll send to the GMU */
>  	return a6xx_gmu_rpmh_votes_init(gmu);
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> index 88f18ea6a38a08b5b171709e5020010947a5d347..bdfc106cb3a578c90d7cd84f7d4fe228d761a994 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> @@ -21,6 +21,15 @@ struct a6xx_gmu_bo {
>  
>  #define GMU_MAX_GX_FREQS	16
>  #define GMU_MAX_CX_FREQS	4
> +#define GMU_MAX_BCMS		3
> +
> +struct a6xx_bcm {
> +	char *name;
> +	unsigned int buswidth;
> +	bool fixed;
> +	unsigned int perfmode;
> +	unsigned int perfmode_bw;
> +};
>  
>  /*
>   * These define the different GMU wake up options - these define how both the
> @@ -85,6 +94,11 @@ struct a6xx_gmu {
>  	unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
>  	u32 gx_arc_votes[GMU_MAX_GX_FREQS];
>  
> +	int nr_gpu_bws;
> +	unsigned long gpu_bw_table[GMU_MAX_GX_FREQS];
> +	u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS];
> +	u16 gpu_ab_votes[GMU_MAX_GX_FREQS];
> +
>  	int nr_gmu_freqs;
>  	unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
>  	u32 cx_arc_votes[GMU_MAX_CX_FREQS];
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -44,6 +44,7 @@ struct a6xx_info {
>  	u32 gmu_chipid;
>  	u32 gmu_cgc_mode;
>  	u32 prim_fifo_threshold;
> +	const struct a6xx_bcm *bcms;
>  };
>  
>  struct a6xx_gpu {
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
> index 528110169398f69f16443a29a1594d19c36fb595..52ba4a07d7b9a709289acd244a751ace9bdaab5d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
> @@ -173,6 +173,11 @@ struct a6xx_hfi_gx_bw_perf_vote_cmd {
>  	u32 bw;
>  };
>  
> +#define AB_VOTE_MASK		GENMASK(31, 16)
> +#define MAX_AB_VOTE		(FIELD_MAX(AB_VOTE_MASK) - 1)
> +#define AB_VOTE(vote)		FIELD_PREP(AB_VOTE_MASK, (vote))
> +#define AB_VOTE_ENABLE		BIT(8)
> +
>  #define HFI_H2F_MSG_PREPARE_SLUMBER 33
>  
>  struct a6xx_hfi_prep_slumber_cmd {
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index
  2024-11-29 15:33   ` Konrad Dybcio
@ 2024-11-30 22:02     ` Akhil P Oommen
  0 siblings, 0 replies; 26+ messages in thread
From: Akhil P Oommen @ 2024-11-30 22:02 UTC (permalink / raw)
  To: Konrad Dybcio, Neil Armstrong, Rob Clark, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	David Airlie, Simona Vetter, Bjorn Andersson, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 11/29/2024 9:03 PM, Konrad Dybcio wrote:
> On 28.11.2024 11:25 AM, Neil Armstrong wrote:
>> The Adreno GPU Management Unit (GMU) can also scale the DDR Bandwidth
>> along the Frequency and Power Domain level, until now we left the OPP
>> core scale the OPP bandwidth via the interconnect path.
>>
>> In order to enable bandwidth voting via the GPU Management
>> Unit (GMU), when an opp is set by devfreq we also look for
>> the corresponding bandwidth index in the previously generated
>> bw_table and pass this value along the frequency index to the GMU.
>>
>> The AB pre-calculated vote is appended to the bandwidth index
>> to inform the GMU firmware the quantity of bandwidth we need.
>>
>> Since we now vote for all resources via the GMU, setting the OPP
>> is no more needed, so we can completely skip calling
>> dev_pm_opp_set_opp() in this situation.
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 23 +++++++++++++++++++++--
>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  2 +-
>>  drivers/gpu/drm/msm/adreno/a6xx_hfi.c |  6 +++---
>>  3 files changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index ee2010a01186721dd377f1655fcf05ddaff77131..c09442ecc861c4e56c81e7e775b9e57baf7d2e51 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -110,9 +110,11 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>  		       bool suspended)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>  	u32 perf_index;
>> +	u32 bw_index = 0;
>>  	unsigned long gpu_freq;
>>  	int ret = 0;
>>  
>> @@ -125,6 +127,21 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>  		if (gpu_freq == gmu->gpu_freqs[perf_index])
>>  			break;
>>  
>> +	/* If enabled, find the corresponding DDR bandwidth index */
>> +	if (info->bcms && gmu->nr_gpu_bws > 1) {
>> +		unsigned int bw = dev_pm_opp_get_bw(opp, true, 0);
>> +
>> +		for (bw_index = 0; bw_index < gmu->nr_gpu_bws - 1; bw_index++) {
>> +			if (bw == gmu->gpu_bw_table[bw_index])
>> +				break;
>> +		}
>> +
>> +		if (bw_index) {
>> +			bw_index |= AB_VOTE(gmu->gpu_ab_votes[bw_index]);
>> +			bw_index |= AB_VOTE_ENABLE;
>> +		}
>> +	}
> 
> If we couple frequency levels with bw levels (i.e. duplicate the highest
> bandwidth a couple times), we can drop all this search logic..

Will that alter hfi table? I prefer to avoid altering GMU data. We
should not make any assumption on how fw would use the data. And we
don't want to hit any surprises on power management related things.

Lets use the GMU interfaces exactly how it is used in downstream. No
assumptions and improvisation please!

-Akhil

> 
>> +
>>  	gmu->current_perf_index = perf_index;
>>  	gmu->freq = gmu->gpu_freqs[perf_index];
>>  
>> @@ -140,8 +157,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>  		return;
>>  
>>  	if (!gmu->legacy) {
>> -		a6xx_hfi_set_freq(gmu, perf_index);
>> -		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>> +		a6xx_hfi_set_freq(gmu, perf_index, bw_index);
>> +		/* With Bandwidth voting, we now vote for all resources, so skip OPP set */
>> +		if (!bw_index)
>> +			dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> 
> ..and then it would come down to..
> 
> if (!info->bcms)
> 
> Konrad


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-11-29 15:21   ` Konrad Dybcio
@ 2024-12-02  8:41     ` Neil Armstrong
  0 siblings, 0 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-12-02  8:41 UTC (permalink / raw)
  To: Konrad Dybcio, Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 29/11/2024 16:21, Konrad Dybcio wrote:
> On 28.11.2024 11:25 AM, Neil Armstrong wrote:
>> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
>> the Frequency and Power Domain level, but by default we leave the
>> OPP core scale the interconnect ddr path.
>>
>> While scaling via the interconnect path was sufficient, newer GPUs
>> like the A750 requires specific vote paremeters and bandwidth to
>> achieve full functionality.
>>
>> In order to calculate vote values used by the GPU Management
>> Unit (GMU), we need to parse all the possible OPP Bandwidths and
>> create a vote value to be sent to the appropriate Bus Control
>> Modules (BCMs) declared in the GPU info struct.
>>
>> This vote value is called IB, while on the othe side the GMU also
>> takes another vote called AB which is a 16bit quantized value
>> of the bandwidth against the maximum supported bandwidth.
>>
>> The vote array will then be used to dynamically generate the GMU
>> bw_table sent during the GMU power-up.
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>>   drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>>   4 files changed, 194 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -9,6 +9,7 @@
>>   #include <linux/pm_domain.h>
>>   #include <linux/pm_opp.h>
>>   #include <soc/qcom/cmd-db.h>
>> +#include <soc/qcom/tcs.h>
>>   #include <drm/drm_gem.h>
>>   
>>   #include "a6xx_gpu.h"
>> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>>   	return 0;
>>   }
>>   
>> +/**
>> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
>> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
>> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
>> + * @vcd: virtual clock domain that this bcm belongs to
>> + * @reserved: reserved field
>> + */
>> +struct bcm_db {
>> +	__le32 unit;
>> +	__le16 width;
>> +	u8 vcd;
>> +	u8 reserved;
>> +};
>> +
>> +static u64 bcm_div(u64 num, u32 base)
>> +{
>> +	/* Ensure that small votes aren't lost. */
>> +	if (num && num < base)
>> +		return 1;
>> +
>> +	do_div(num, base);
>> +
>> +	return num;
>> +}
> 
> This should live in include/soc/qcom/bcm.h, similarly to tcs.h in
> that directory

Honestly, I don't think so, there's no bcm specific logic here, we
simply avoid returning 0 after a division

> 
>> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
>> +				       struct a6xx_gmu *gmu)
>> +{
>> +	const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
>> +	unsigned int bcm_index, bw_index, bcm_count = 0;
>> +
>> +	if (!info->bcms)
>> +		return 0;
>> +
>> +	/* Retrieve BCM data from cmd-db */
>> +	for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
>> +		size_t count;
>> +
>> +		/* Stop at first unconfigured bcm */
>> +		if (!info->bcms[bcm_index].name)
>> +			break;
>> +
>> +		bcm_data[bcm_index] = cmd_db_read_aux_data(
>> +						info->bcms[bcm_index].name,
>> +						&count);
>> +		if (IS_ERR(bcm_data[bcm_index]))
>> +			return PTR_ERR(bcm_data[bcm_index]);
>> +
>> +		if (!count)
>> +			return -EINVAL;
>> +
>> +		++bcm_count;
>> +	}
>> +
>> +	/* Generate BCM votes values for each bandwidth & BCM */
>> +	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>> +		u32 *data = gmu->gpu_ib_votes[bw_index];
>> +		u32 bw = gmu->gpu_bw_table[bw_index];
>> +
>> +		/* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */
> 
> Ditto, perhaps this should be exported from icc

I think it's a bad idea to share code because the overall structures and purposes
are completely different, and it will make the gpu maintenance a nightmare.

> 
> [...]
> 
> Konrad

Thanks,
Neil


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-11-30 21:49   ` Akhil P Oommen
@ 2024-12-02  8:46     ` Neil Armstrong
  2024-12-04 15:35       ` Neil Armstrong
  2024-12-04 18:43       ` Akhil P Oommen
  0 siblings, 2 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-12-02  8:46 UTC (permalink / raw)
  To: Akhil P Oommen, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 30/11/2024 22:49, Akhil P Oommen wrote:
> On 11/28/2024 3:55 PM, Neil Armstrong wrote:
>> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
>> the Frequency and Power Domain level, but by default we leave the
>> OPP core scale the interconnect ddr path.
>>
>> While scaling via the interconnect path was sufficient, newer GPUs
>> like the A750 requires specific vote paremeters and bandwidth to
>> achieve full functionality.
>>
>> In order to calculate vote values used by the GPU Management
>> Unit (GMU), we need to parse all the possible OPP Bandwidths and
>> create a vote value to be sent to the appropriate Bus Control
>> Modules (BCMs) declared in the GPU info struct.
>>
>> This vote value is called IB, while on the othe side the GMU also
>> takes another vote called AB which is a 16bit quantized value
>> of the bandwidth against the maximum supported bandwidth.
>>
>> The vote array will then be used to dynamically generate the GMU
>> bw_table sent during the GMU power-up.
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>>   drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>>   4 files changed, 194 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -9,6 +9,7 @@
>>   #include <linux/pm_domain.h>
>>   #include <linux/pm_opp.h>
>>   #include <soc/qcom/cmd-db.h>
>> +#include <soc/qcom/tcs.h>
>>   #include <drm/drm_gem.h>
>>   
>>   #include "a6xx_gpu.h"
>> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>>   	return 0;
>>   }
>>   
>> +/**
>> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
>> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
>> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
>> + * @vcd: virtual clock domain that this bcm belongs to
>> + * @reserved: reserved field
>> + */
>> +struct bcm_db {
>> +	__le32 unit;
>> +	__le16 width;
>> +	u8 vcd;
>> +	u8 reserved;
>> +};
>> +
>> +static u64 bcm_div(u64 num, u32 base)
>> +{
>> +	/* Ensure that small votes aren't lost. */
>> +	if (num && num < base)
>> +		return 1;
>> +
>> +	do_div(num, base);
>> +
>> +	return num;
>> +}
>> +
>> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
>> +				       struct a6xx_gmu *gmu)
>> +{
>> +	const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
>> +	unsigned int bcm_index, bw_index, bcm_count = 0;
>> +
>> +	if (!info->bcms)
>> +		return 0;
>> +
>> +	/* Retrieve BCM data from cmd-db */
>> +	for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
>> +		size_t count;
>> +
>> +		/* Stop at first unconfigured bcm */
>> +		if (!info->bcms[bcm_index].name)
>> +			break;
>> +
>> +		bcm_data[bcm_index] = cmd_db_read_aux_data(
>> +						info->bcms[bcm_index].name,
>> +						&count);
>> +		if (IS_ERR(bcm_data[bcm_index]))
>> +			return PTR_ERR(bcm_data[bcm_index]);
>> +
>> +		if (!count)
>> +			return -EINVAL;
>> +
>> +		++bcm_count;
>> +	}
>> +
>> +	/* Generate BCM votes values for each bandwidth & BCM */
>> +	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>> +		u32 *data = gmu->gpu_ib_votes[bw_index];
>> +		u32 bw = gmu->gpu_bw_table[bw_index];
>> +
>> +		/* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */
>> +		for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
>> +			bool commit = false;
>> +			u64 peak, vote;
>> +			u16 width;
>> +			u32 unit;
>> +
>> +			/* Skip unconfigured BCM */
>> +			if (!bcm_data[bcm_index])
>> +				continue;
>> +
>> +			if (bcm_index == bcm_count - 1 ||
>> +			    (bcm_data[bcm_index + 1] &&
>> +			     bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd))
>> +				commit = true;
>> +
>> +			if (!bw) {
>> +				data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
>> +				continue;
>> +			}
>> +
>> +			if (info->bcms[bcm_index].fixed) {
>> +				u32 perfmode = 0;
>> +
>> +				if (bw >= info->bcms[bcm_index].perfmode_bw)
>> +					perfmode = info->bcms[bcm_index].perfmode;
>> +
>> +				data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode);
>> +				continue;
>> +			}
>> +
>> +			/* Multiply the bandwidth by the width of the connection */
>> +			width = le16_to_cpu(bcm_data[bcm_index]->width);
>> +			peak = bcm_div((u64)bw * width, info->bcms[bcm_index].buswidth);
>> +
>> +			/* Input bandwidth value is in KBps, scale the value to BCM unit */
>> +			unit = le32_to_cpu(bcm_data[bcm_index]->unit);
>> +			vote = bcm_div(peak * 1000ULL, unit);
>> +
>> +			if (vote > BCM_TCS_CMD_VOTE_MASK)
>> +				vote = BCM_TCS_CMD_VOTE_MASK;
> 
> use clamp()?

Yep, I think I could replace bcm_div with clamp

> 
>> +
>> +			data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
>> +		}
>> +	}
>> +
>> +	/* Generate AB votes which are a quantitized bandwidth value */
>> +	for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>> +		u64 tmp;
>> +
>> +		/*
>> +		 * The AB vote consists of a 16 bit wide quantized level
>> +		 * against the maximum supported bandwidth.
>> +		 * Quantization can be calculated as below:
>> +		 * vote = (bandwidth * 2^16) / max bandwidth
>> +		 */
>> +		tmp = gmu->gpu_bw_table[bw_index] * MAX_AB_VOTE;
>> +
>> +		/* Divide by the maximum bandwidth to get a quantized value */
>> +		gmu->gpu_ab_votes[bw_index] =
>> +			bcm_div(tmp, gmu->gpu_bw_table[gmu->nr_gpu_bws - 1]);
>> +	}
> 
> So I suppose you are trying to vote AB equal to IB. Aggregation logic
> for both are different. So this will make DDR scale very aggressively. A
> more reasonable approach would be to vote a % of IB vote (25%?). Ideally
> we should measure GPU's bandwidth usage and vote that if you are really
> concerned about stability issues. IB is just a floor vote, GPU can
> generate way higher traffic.

I think this should be optimized better in a different patchset, so I would
like to make the simplest vote possible here to retain functionnality.

So if I understand I should divide this vote value by 4 ? Downstram uses 25% by default
when no AB was calculated, but what does that mean exactly ?

Is there a counter somewhere to measure the GPU traffic ?

> 
>> +
>> +	return 0;
>> +}
>> +
>>   /* Return the 'arc-level' for the given frequency */
>>   static unsigned int a6xx_gmu_get_arc_level(struct device *dev,
>>   					   unsigned long freq)
>> @@ -1390,12 +1516,15 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes,
>>    * The GMU votes with the RPMh for itself and on behalf of the GPU but we need
>>    * to construct the list of votes on the CPU and send it over. Query the RPMh
>>    * voltage levels and build the votes
>> + * The GMU can also vote for DDR interconnects, use the OPP bandwidth entries
>> + * and BCM parameters to build the votes.
>>    */
>>   
>>   static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>>   {
>>   	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>>   	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>   	struct msm_gpu *gpu = &adreno_gpu->base;
>>   	int ret;
>>   
>> @@ -1407,6 +1536,10 @@ static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>>   	ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes,
>>   		gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl");
>>   
>> +	/* Build the interconnect votes */
>> +	if (info->bcms && gmu->nr_gpu_bws > 1)
>> +		ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu);
>> +
>>   	return ret;
>>   }
>>   
>> @@ -1442,10 +1575,43 @@ static int a6xx_gmu_build_freq_table(struct device *dev, unsigned long *freqs,
>>   	return index;
>>   }
>>   
>> +static int a6xx_gmu_build_bw_table(struct device *dev, unsigned long *bandwidths,
>> +		u32 size)
>> +{
>> +	int count = dev_pm_opp_get_opp_count(dev);
> 
> I am less concerned about this now since you are not voting real AB BW.

Sorry I don't understand

Thanks,
Neil

> 
> -Akhil.
> 
>> +	struct dev_pm_opp *opp;
>> +	int i, index = 0;
>> +	unsigned int bandwidth = 1;
>> +
>> +	/*
>> +	 * The OPP table doesn't contain the "off" bandwidth level so we need to
>> +	 * add 1 to the table size to account for it
>> +	 */
>> +
>> +	if (WARN(count + 1 > size,
>> +		"The GMU bandwidth table is being truncated\n"))
>> +		count = size - 1;
>> +
>> +	/* Set the "off" bandwidth */
>> +	bandwidths[index++] = 0;
>> +
>> +	for (i = 0; i < count; i++) {
>> +		opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0);
>> +		if (IS_ERR(opp))
>> +			break;
>> +
>> +		dev_pm_opp_put(opp);
>> +		bandwidths[index++] = bandwidth++;
>> +	}
>> +
>> +	return index;
>> +}
>> +
>>   static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>>   {
>>   	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>>   	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>> +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>   	struct msm_gpu *gpu = &adreno_gpu->base;
>>   
>>   	int ret = 0;
>> @@ -1472,6 +1638,14 @@ static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>>   
>>   	gmu->current_perf_index = gmu->nr_gpu_freqs - 1;
>>   
>> +	/*
>> +	 * The GMU also handles GPU Interconnect Votes so build a list
>> +	 * of DDR bandwidths from the GPU OPP table
>> +	 */
>> +	if (info->bcms)
>> +		gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev,
>> +			gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table));
>> +
>>   	/* Build the list of RPMh votes that we'll send to the GMU */
>>   	return a6xx_gmu_rpmh_votes_init(gmu);
>>   }
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>> index 88f18ea6a38a08b5b171709e5020010947a5d347..bdfc106cb3a578c90d7cd84f7d4fe228d761a994 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>> @@ -21,6 +21,15 @@ struct a6xx_gmu_bo {
>>   
>>   #define GMU_MAX_GX_FREQS	16
>>   #define GMU_MAX_CX_FREQS	4
>> +#define GMU_MAX_BCMS		3
>> +
>> +struct a6xx_bcm {
>> +	char *name;
>> +	unsigned int buswidth;
>> +	bool fixed;
>> +	unsigned int perfmode;
>> +	unsigned int perfmode_bw;
>> +};
>>   
>>   /*
>>    * These define the different GMU wake up options - these define how both the
>> @@ -85,6 +94,11 @@ struct a6xx_gmu {
>>   	unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
>>   	u32 gx_arc_votes[GMU_MAX_GX_FREQS];
>>   
>> +	int nr_gpu_bws;
>> +	unsigned long gpu_bw_table[GMU_MAX_GX_FREQS];
>> +	u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS];
>> +	u16 gpu_ab_votes[GMU_MAX_GX_FREQS];
>> +
>>   	int nr_gmu_freqs;
>>   	unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
>>   	u32 cx_arc_votes[GMU_MAX_CX_FREQS];
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -44,6 +44,7 @@ struct a6xx_info {
>>   	u32 gmu_chipid;
>>   	u32 gmu_cgc_mode;
>>   	u32 prim_fifo_threshold;
>> +	const struct a6xx_bcm *bcms;
>>   };
>>   
>>   struct a6xx_gpu {
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>> index 528110169398f69f16443a29a1594d19c36fb595..52ba4a07d7b9a709289acd244a751ace9bdaab5d 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>> @@ -173,6 +173,11 @@ struct a6xx_hfi_gx_bw_perf_vote_cmd {
>>   	u32 bw;
>>   };
>>   
>> +#define AB_VOTE_MASK		GENMASK(31, 16)
>> +#define MAX_AB_VOTE		(FIELD_MAX(AB_VOTE_MASK) - 1)
>> +#define AB_VOTE(vote)		FIELD_PREP(AB_VOTE_MASK, (vote))
>> +#define AB_VOTE_ENABLE		BIT(8)
>> +
>>   #define HFI_H2F_MSG_PREPARE_SLUMBER 33
>>   
>>   struct a6xx_hfi_prep_slumber_cmd {
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750
  2024-11-29 15:25   ` Konrad Dybcio
@ 2024-12-02  8:47     ` Neil Armstrong
  0 siblings, 0 replies; 26+ messages in thread
From: Neil Armstrong @ 2024-12-02  8:47 UTC (permalink / raw)
  To: Konrad Dybcio, Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
	Bjorn Andersson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 29/11/2024 16:25, Konrad Dybcio wrote:
> On 28.11.2024 11:25 AM, Neil Armstrong wrote:
>> Now all the DDR bandwidth voting via the GPU Management Unit (GMU)
>> is in place, declare the Bus Control Modules (BCMs) and the
>> corresponding parameters in the GPU info struct and add the
>> GMU_BW_VOTE feature bit to enable it.
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 22 ++++++++++++++++++++++
>>   1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>> index 0c560e84ad5a53bb4e8a49ba4e153ce9cf33f7ae..edffb7737a97b268bb2986d557969e651988a344 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>> @@ -1388,6 +1388,17 @@ static const struct adreno_info a7xx_gpus[] = {
>>   			.pwrup_reglist = &a7xx_pwrup_reglist,
>>   			.gmu_chipid = 0x7020100,
>>   			.gmu_cgc_mode = 0x00020202,
>> +			.bcms = (const struct a6xx_bcm[]) {
>> +				{ .name = "SH0", .buswidth = 16 },
>> +				{ .name = "MC0", .buswidth = 4 },
>> +				{
>> +					.name = "ACV",
>> +					.fixed = true,
>> +					.perfmode = BIT(3),
>> +					.perfmode_bw = 16500000,
>> +				},
>> +				{ /* sentinel */ },
>> +			},
> 
> This is not going to fly the second there's two SoCs implementing the
> same GPU with a difference in bus topology. I think we could add
> something like drvdata to ICC nodes and use it for BCMs on icc-rpmh.
> Then, we could retrieve it from the interconnect path we get from the
> dt node. It would also reduce duplication.

I don't want to go into that, we can optimize this when adding topologies
for other GPUs later, as-is this is a pointer so we can already share the
same table between GPUs.

> 
> Konrad


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-12-02  8:46     ` Neil Armstrong
@ 2024-12-04 15:35       ` Neil Armstrong
  2024-12-04 19:15         ` Akhil P Oommen
  2024-12-04 18:43       ` Akhil P Oommen
  1 sibling, 1 reply; 26+ messages in thread
From: Neil Armstrong @ 2024-12-04 15:35 UTC (permalink / raw)
  To: Akhil P Oommen, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 02/12/2024 09:46, Neil Armstrong wrote:
> On 30/11/2024 22:49, Akhil P Oommen wrote:
>> On 11/28/2024 3:55 PM, Neil Armstrong wrote:
>>> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
>>> the Frequency and Power Domain level, but by default we leave the
>>> OPP core scale the interconnect ddr path.
>>>
>>> While scaling via the interconnect path was sufficient, newer GPUs
>>> like the A750 requires specific vote paremeters and bandwidth to
>>> achieve full functionality.
>>>
>>> In order to calculate vote values used by the GPU Management
>>> Unit (GMU), we need to parse all the possible OPP Bandwidths and
>>> create a vote value to be sent to the appropriate Bus Control
>>> Modules (BCMs) declared in the GPU info struct.
>>>
>>> This vote value is called IB, while on the othe side the GMU also
>>> takes another vote called AB which is a 16bit quantized value
>>> of the bandwidth against the maximum supported bandwidth.
>>>
>>> The vote array will then be used to dynamically generate the GMU
>>> bw_table sent during the GMU power-up.
>>>
>>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>>> ---
>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>>>   drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>>>   4 files changed, 194 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>> index 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>> @@ -9,6 +9,7 @@
>>>   #include <linux/pm_domain.h>
>>>   #include <linux/pm_opp.h>
>>>   #include <soc/qcom/cmd-db.h>
>>> +#include <soc/qcom/tcs.h>
>>>   #include <drm/drm_gem.h>
>>>   #include "a6xx_gpu.h"
>>> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>>>       return 0;
>>>   }
>>> +/**
>>> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
>>> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
>>> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
>>> + * @vcd: virtual clock domain that this bcm belongs to
>>> + * @reserved: reserved field
>>> + */
>>> +struct bcm_db {
>>> +    __le32 unit;
>>> +    __le16 width;
>>> +    u8 vcd;
>>> +    u8 reserved;
>>> +};
>>> +
>>> +static u64 bcm_div(u64 num, u32 base)
>>> +{
>>> +    /* Ensure that small votes aren't lost. */
>>> +    if (num && num < base)
>>> +        return 1;
>>> +
>>> +    do_div(num, base);
>>> +
>>> +    return num;
>>> +}
>>> +
>>> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
>>> +                       struct a6xx_gmu *gmu)
>>> +{
>>> +    const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
>>> +    unsigned int bcm_index, bw_index, bcm_count = 0;
>>> +
>>> +    if (!info->bcms)
>>> +        return 0;
>>> +
>>> +    /* Retrieve BCM data from cmd-db */
>>> +    for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
>>> +        size_t count;
>>> +
>>> +        /* Stop at first unconfigured bcm */
>>> +        if (!info->bcms[bcm_index].name)
>>> +            break;
>>> +
>>> +        bcm_data[bcm_index] = cmd_db_read_aux_data(
>>> +                        info->bcms[bcm_index].name,
>>> +                        &count);
>>> +        if (IS_ERR(bcm_data[bcm_index]))
>>> +            return PTR_ERR(bcm_data[bcm_index]);
>>> +
>>> +        if (!count)
>>> +            return -EINVAL;
>>> +
>>> +        ++bcm_count;
>>> +    }
>>> +
>>> +    /* Generate BCM votes values for each bandwidth & BCM */
>>> +    for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>>> +        u32 *data = gmu->gpu_ib_votes[bw_index];
>>> +        u32 bw = gmu->gpu_bw_table[bw_index];
>>> +
>>> +        /* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */
>>> +        for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
>>> +            bool commit = false;
>>> +            u64 peak, vote;
>>> +            u16 width;
>>> +            u32 unit;
>>> +
>>> +            /* Skip unconfigured BCM */
>>> +            if (!bcm_data[bcm_index])
>>> +                continue;
>>> +
>>> +            if (bcm_index == bcm_count - 1 ||
>>> +                (bcm_data[bcm_index + 1] &&
>>> +                 bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd))
>>> +                commit = true;
>>> +
>>> +            if (!bw) {
>>> +                data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
>>> +                continue;
>>> +            }
>>> +
>>> +            if (info->bcms[bcm_index].fixed) {
>>> +                u32 perfmode = 0;
>>> +
>>> +                if (bw >= info->bcms[bcm_index].perfmode_bw)
>>> +                    perfmode = info->bcms[bcm_index].perfmode;
>>> +
>>> +                data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode);
>>> +                continue;
>>> +            }
>>> +
>>> +            /* Multiply the bandwidth by the width of the connection */
>>> +            width = le16_to_cpu(bcm_data[bcm_index]->width);
>>> +            peak = bcm_div((u64)bw * width, info->bcms[bcm_index].buswidth);
>>> +
>>> +            /* Input bandwidth value is in KBps, scale the value to BCM unit */
>>> +            unit = le32_to_cpu(bcm_data[bcm_index]->unit);
>>> +            vote = bcm_div(peak * 1000ULL, unit);
>>> +
>>> +            if (vote > BCM_TCS_CMD_VOTE_MASK)
>>> +                vote = BCM_TCS_CMD_VOTE_MASK;
>>
>> use clamp()?
> 
> Yep, I think I could replace bcm_div with clamp
> 
>>
>>> +
>>> +            data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
>>> +        }
>>> +    }
>>> +
>>> +    /* Generate AB votes which are a quantitized bandwidth value */
>>> +    for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>>> +        u64 tmp;
>>> +
>>> +        /*
>>> +         * The AB vote consists of a 16 bit wide quantized level
>>> +         * against the maximum supported bandwidth.
>>> +         * Quantization can be calculated as below:
>>> +         * vote = (bandwidth * 2^16) / max bandwidth
>>> +         */
>>> +        tmp = gmu->gpu_bw_table[bw_index] * MAX_AB_VOTE;
>>> +
>>> +        /* Divide by the maximum bandwidth to get a quantized value */
>>> +        gmu->gpu_ab_votes[bw_index] =
>>> +            bcm_div(tmp, gmu->gpu_bw_table[gmu->nr_gpu_bws - 1]);
>>> +    }
>>
>> So I suppose you are trying to vote AB equal to IB. Aggregation logic
>> for both are different. So this will make DDR scale very aggressively. A
>> more reasonable approach would be to vote a % of IB vote (25%?). Ideally
>> we should measure GPU's bandwidth usage and vote that if you are really
>> concerned about stability issues. IB is just a floor vote, GPU can
>> generate way higher traffic.
> 
> I think this should be optimized better in a different patchset, so I would
> like to make the simplest vote possible here to retain functionnality.
> 
> So if I understand I should divide this vote value by 4 ? Downstram uses 25% by default
> when no AB was calculated, but what does that mean exactly ?
> 
> Is there a counter somewhere to measure the GPU traffic ?


What if we also specified opp-avg-kBps for each OPP, define it to the downstream "qcom,bus-min"
and use this value for the AB vote ? and use 25% as fallback like downstream.

Neil

> 
>>
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   /* Return the 'arc-level' for the given frequency */
>>>   static unsigned int a6xx_gmu_get_arc_level(struct device *dev,
>>>                          unsigned long freq)
>>> @@ -1390,12 +1516,15 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes,
>>>    * The GMU votes with the RPMh for itself and on behalf of the GPU but we need
>>>    * to construct the list of votes on the CPU and send it over. Query the RPMh
>>>    * voltage levels and build the votes
>>> + * The GMU can also vote for DDR interconnects, use the OPP bandwidth entries
>>> + * and BCM parameters to build the votes.
>>>    */
>>>   static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>>>   {
>>>       struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>>>       struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>> +    const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>>       struct msm_gpu *gpu = &adreno_gpu->base;
>>>       int ret;
>>> @@ -1407,6 +1536,10 @@ static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>>>       ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes,
>>>           gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl");
>>> +    /* Build the interconnect votes */
>>> +    if (info->bcms && gmu->nr_gpu_bws > 1)
>>> +        ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu);
>>> +
>>>       return ret;
>>>   }
>>> @@ -1442,10 +1575,43 @@ static int a6xx_gmu_build_freq_table(struct device *dev, unsigned long *freqs,
>>>       return index;
>>>   }
>>> +static int a6xx_gmu_build_bw_table(struct device *dev, unsigned long *bandwidths,
>>> +        u32 size)
>>> +{
>>> +    int count = dev_pm_opp_get_opp_count(dev);
>>
>> I am less concerned about this now since you are not voting real AB BW.
> 
> Sorry I don't understand
> 
> Thanks,
> Neil
> 
>>
>> -Akhil.
>>
>>> +    struct dev_pm_opp *opp;
>>> +    int i, index = 0;
>>> +    unsigned int bandwidth = 1;
>>> +
>>> +    /*
>>> +     * The OPP table doesn't contain the "off" bandwidth level so we need to
>>> +     * add 1 to the table size to account for it
>>> +     */
>>> +
>>> +    if (WARN(count + 1 > size,
>>> +        "The GMU bandwidth table is being truncated\n"))
>>> +        count = size - 1;
>>> +
>>> +    /* Set the "off" bandwidth */
>>> +    bandwidths[index++] = 0;
>>> +
>>> +    for (i = 0; i < count; i++) {
>>> +        opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0);
>>> +        if (IS_ERR(opp))
>>> +            break;
>>> +
>>> +        dev_pm_opp_put(opp);
>>> +        bandwidths[index++] = bandwidth++;
>>> +    }
>>> +
>>> +    return index;
>>> +}
>>> +
>>>   static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>>>   {
>>>       struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
>>>       struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>> +    const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>>       struct msm_gpu *gpu = &adreno_gpu->base;
>>>       int ret = 0;
>>> @@ -1472,6 +1638,14 @@ static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>>>       gmu->current_perf_index = gmu->nr_gpu_freqs - 1;
>>> +    /*
>>> +     * The GMU also handles GPU Interconnect Votes so build a list
>>> +     * of DDR bandwidths from the GPU OPP table
>>> +     */
>>> +    if (info->bcms)
>>> +        gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev,
>>> +            gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table));
>>> +
>>>       /* Build the list of RPMh votes that we'll send to the GMU */
>>>       return a6xx_gmu_rpmh_votes_init(gmu);
>>>   }
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>> index 88f18ea6a38a08b5b171709e5020010947a5d347..bdfc106cb3a578c90d7cd84f7d4fe228d761a994 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>> @@ -21,6 +21,15 @@ struct a6xx_gmu_bo {
>>>   #define GMU_MAX_GX_FREQS    16
>>>   #define GMU_MAX_CX_FREQS    4
>>> +#define GMU_MAX_BCMS        3
>>> +
>>> +struct a6xx_bcm {
>>> +    char *name;
>>> +    unsigned int buswidth;
>>> +    bool fixed;
>>> +    unsigned int perfmode;
>>> +    unsigned int perfmode_bw;
>>> +};
>>>   /*
>>>    * These define the different GMU wake up options - these define how both the
>>> @@ -85,6 +94,11 @@ struct a6xx_gmu {
>>>       unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
>>>       u32 gx_arc_votes[GMU_MAX_GX_FREQS];
>>> +    int nr_gpu_bws;
>>> +    unsigned long gpu_bw_table[GMU_MAX_GX_FREQS];
>>> +    u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS];
>>> +    u16 gpu_ab_votes[GMU_MAX_GX_FREQS];
>>> +
>>>       int nr_gmu_freqs;
>>>       unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
>>>       u32 cx_arc_votes[GMU_MAX_CX_FREQS];
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>> index 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>> @@ -44,6 +44,7 @@ struct a6xx_info {
>>>       u32 gmu_chipid;
>>>       u32 gmu_cgc_mode;
>>>       u32 prim_fifo_threshold;
>>> +    const struct a6xx_bcm *bcms;
>>>   };
>>>   struct a6xx_gpu {
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>> index 528110169398f69f16443a29a1594d19c36fb595..52ba4a07d7b9a709289acd244a751ace9bdaab5d 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>> @@ -173,6 +173,11 @@ struct a6xx_hfi_gx_bw_perf_vote_cmd {
>>>       u32 bw;
>>>   };
>>> +#define AB_VOTE_MASK        GENMASK(31, 16)
>>> +#define MAX_AB_VOTE        (FIELD_MAX(AB_VOTE_MASK) - 1)
>>> +#define AB_VOTE(vote)        FIELD_PREP(AB_VOTE_MASK, (vote))
>>> +#define AB_VOTE_ENABLE        BIT(8)
>>> +
>>>   #define HFI_H2F_MSG_PREPARE_SLUMBER 33
>>>   struct a6xx_hfi_prep_slumber_cmd {
>>>
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-12-02  8:46     ` Neil Armstrong
  2024-12-04 15:35       ` Neil Armstrong
@ 2024-12-04 18:43       ` Akhil P Oommen
  1 sibling, 0 replies; 26+ messages in thread
From: Akhil P Oommen @ 2024-12-04 18:43 UTC (permalink / raw)
  To: neil.armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 12/2/2024 2:16 PM, Neil Armstrong wrote:
> On 30/11/2024 22:49, Akhil P Oommen wrote:
>> On 11/28/2024 3:55 PM, Neil Armstrong wrote:
>>> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
>>> the Frequency and Power Domain level, but by default we leave the
>>> OPP core scale the interconnect ddr path.
>>>
>>> While scaling via the interconnect path was sufficient, newer GPUs
>>> like the A750 requires specific vote paremeters and bandwidth to
>>> achieve full functionality.
>>>
>>> In order to calculate vote values used by the GPU Management
>>> Unit (GMU), we need to parse all the possible OPP Bandwidths and
>>> create a vote value to be sent to the appropriate Bus Control
>>> Modules (BCMs) declared in the GPU info struct.
>>>
>>> This vote value is called IB, while on the othe side the GMU also
>>> takes another vote called AB which is a 16bit quantized value
>>> of the bandwidth against the maximum supported bandwidth.
>>>
>>> The vote array will then be used to dynamically generate the GMU
>>> bw_table sent during the GMU power-up.
>>>
>>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>>> ---
>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 +++++++++++++++++++++++
>>> +++++++++++
>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>>>   drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>>>   4 files changed, 194 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/
>>> msm/adreno/a6xx_gmu.c
>>> index
>>> 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>> @@ -9,6 +9,7 @@
>>>   #include <linux/pm_domain.h>
>>>   #include <linux/pm_opp.h>
>>>   #include <soc/qcom/cmd-db.h>
>>> +#include <soc/qcom/tcs.h>
>>>   #include <drm/drm_gem.h>
>>>     #include "a6xx_gpu.h"
>>> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct
>>> a6xx_gmu *gmu)
>>>       return 0;
>>>   }
>>>   +/**
>>> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock
>>> Manager (BCM)
>>> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
>>> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
>>> + * @vcd: virtual clock domain that this bcm belongs to
>>> + * @reserved: reserved field
>>> + */
>>> +struct bcm_db {
>>> +    __le32 unit;
>>> +    __le16 width;
>>> +    u8 vcd;
>>> +    u8 reserved;
>>> +};
>>> +
>>> +static u64 bcm_div(u64 num, u32 base)
>>> +{
>>> +    /* Ensure that small votes aren't lost. */
>>> +    if (num && num < base)
>>> +        return 1;
>>> +
>>> +    do_div(num, base);
>>> +
>>> +    return num;
>>> +}
>>> +
>>> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
>>> +                       struct a6xx_gmu *gmu)
>>> +{
>>> +    const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
>>> +    unsigned int bcm_index, bw_index, bcm_count = 0;
>>> +
>>> +    if (!info->bcms)
>>> +        return 0;
>>> +
>>> +    /* Retrieve BCM data from cmd-db */
>>> +    for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
>>> +        size_t count;
>>> +
>>> +        /* Stop at first unconfigured bcm */
>>> +        if (!info->bcms[bcm_index].name)
>>> +            break;
>>> +
>>> +        bcm_data[bcm_index] = cmd_db_read_aux_data(
>>> +                        info->bcms[bcm_index].name,
>>> +                        &count);
>>> +        if (IS_ERR(bcm_data[bcm_index]))
>>> +            return PTR_ERR(bcm_data[bcm_index]);
>>> +
>>> +        if (!count)
>>> +            return -EINVAL;
>>> +
>>> +        ++bcm_count;
>>> +    }
>>> +
>>> +    /* Generate BCM votes values for each bandwidth & BCM */
>>> +    for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>>> +        u32 *data = gmu->gpu_ib_votes[bw_index];
>>> +        u32 bw = gmu->gpu_bw_table[bw_index];
>>> +
>>> +        /* Calculations loosely copied from bcm_aggregate() &
>>> tcs_cmd_gen() */
>>> +        for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
>>> +            bool commit = false;
>>> +            u64 peak, vote;
>>> +            u16 width;
>>> +            u32 unit;
>>> +
>>> +            /* Skip unconfigured BCM */
>>> +            if (!bcm_data[bcm_index])
>>> +                continue;
>>> +
>>> +            if (bcm_index == bcm_count - 1 ||
>>> +                (bcm_data[bcm_index + 1] &&
>>> +                 bcm_data[bcm_index]->vcd != bcm_data[bcm_index +
>>> 1]->vcd))
>>> +                commit = true;
>>> +
>>> +            if (!bw) {
>>> +                data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
>>> +                continue;
>>> +            }
>>> +
>>> +            if (info->bcms[bcm_index].fixed) {
>>> +                u32 perfmode = 0;
>>> +
>>> +                if (bw >= info->bcms[bcm_index].perfmode_bw)
>>> +                    perfmode = info->bcms[bcm_index].perfmode;
>>> +
>>> +                data[bcm_index] = BCM_TCS_CMD(commit, true, 0,
>>> perfmode);
>>> +                continue;
>>> +            }
>>> +
>>> +            /* Multiply the bandwidth by the width of the connection */
>>> +            width = le16_to_cpu(bcm_data[bcm_index]->width);
>>> +            peak = bcm_div((u64)bw * width, info-
>>> >bcms[bcm_index].buswidth);
>>> +
>>> +            /* Input bandwidth value is in KBps, scale the value to
>>> BCM unit */
>>> +            unit = le32_to_cpu(bcm_data[bcm_index]->unit);
>>> +            vote = bcm_div(peak * 1000ULL, unit);
>>> +
>>> +            if (vote > BCM_TCS_CMD_VOTE_MASK)
>>> +                vote = BCM_TCS_CMD_VOTE_MASK;
>>
>> use clamp()?
> 
> Yep, I think I could replace bcm_div with clamp
> 
>>
>>> +
>>> +            data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
>>> +        }
>>> +    }
>>> +
>>> +    /* Generate AB votes which are a quantitized bandwidth value */
>>> +    for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>>> +        u64 tmp;
>>> +
>>> +        /*
>>> +         * The AB vote consists of a 16 bit wide quantized level
>>> +         * against the maximum supported bandwidth.
>>> +         * Quantization can be calculated as below:
>>> +         * vote = (bandwidth * 2^16) / max bandwidth
>>> +         */
>>> +        tmp = gmu->gpu_bw_table[bw_index] * MAX_AB_VOTE;
>>> +
>>> +        /* Divide by the maximum bandwidth to get a quantized value */
>>> +        gmu->gpu_ab_votes[bw_index] =
>>> +            bcm_div(tmp, gmu->gpu_bw_table[gmu->nr_gpu_bws - 1]);
>>> +    }
>>
>> So I suppose you are trying to vote AB equal to IB. Aggregation logic
>> for both are different. So this will make DDR scale very aggressively. A
>> more reasonable approach would be to vote a % of IB vote (25%?). Ideally
>> we should measure GPU's bandwidth usage and vote that if you are really
>> concerned about stability issues. IB is just a floor vote, GPU can
>> generate way higher traffic.
> 
> I think this should be optimized better in a different patchset, so I would
> like to make the simplest vote possible here to retain functionnality.
> 
> So if I understand I should divide this vote value by 4 ? Downstram uses
> 25% by default
> when no AB was calculated, but what does that mean exactly ?
> 
> Is there a counter somewhere to measure the GPU traffic ?
> 

Agree about splitting out optimizations to separate series. Better to
merge things as early as possible in small chunks.

In this case, the AB is so huge that it makes IB vote irrelevant. You
can get an idea of the aggregration logic if you check the icc driver a
bit. 25% was just a random number that came to my mind. The idea was to
just contribute something to AB vote instead of nothing. Blindly doing
AB vote equal IB vote would push DDR to its max freq most of the time.
And higher DDR increases the overall soc power consumption a lot.

Ideally for GPU, we should measure and vote the actual bandwidth usage.
Because GPU's bandwidth usage varies a lot based on the workload. There
are a couple of GBIF counters that exposes the BW usage. Just check "ram
cycles" in kgsl. That can be a separate series.


>>
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   /* Return the 'arc-level' for the given frequency */
>>>   static unsigned int a6xx_gmu_get_arc_level(struct device *dev,
>>>                          unsigned long freq)
>>> @@ -1390,12 +1516,15 @@ static int
>>> a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes,
>>>    * The GMU votes with the RPMh for itself and on behalf of the GPU
>>> but we need
>>>    * to construct the list of votes on the CPU and send it over.
>>> Query the RPMh
>>>    * voltage levels and build the votes
>>> + * The GMU can also vote for DDR interconnects, use the OPP
>>> bandwidth entries
>>> + * and BCM parameters to build the votes.
>>>    */
>>>     static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>>>   {
>>>       struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu,
>>> gmu);
>>>       struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>> +    const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>>       struct msm_gpu *gpu = &adreno_gpu->base;
>>>       int ret;
>>>   @@ -1407,6 +1536,10 @@ static int a6xx_gmu_rpmh_votes_init(struct
>>> a6xx_gmu *gmu)
>>>       ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes,
>>>           gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl");
>>>   +    /* Build the interconnect votes */
>>> +    if (info->bcms && gmu->nr_gpu_bws > 1)
>>> +        ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu);
>>> +
>>>       return ret;
>>>   }
>>>   @@ -1442,10 +1575,43 @@ static int a6xx_gmu_build_freq_table(struct
>>> device *dev, unsigned long *freqs,
>>>       return index;
>>>   }
>>>   +static int a6xx_gmu_build_bw_table(struct device *dev, unsigned
>>> long *bandwidths,
>>> +        u32 size)
>>> +{
>>> +    int count = dev_pm_opp_get_opp_count(dev);
>>
>> I am less concerned about this now since you are not voting real AB BW.
> 
> Sorry I don't understand
> 

I mean here the maximum AB vote is capped to the maximum IB vote in the
DDR table. So when you vote real AB in some SoC SKUs, there may be a
scenario where it could be higher than IB. But that never happens when
the AB is some % of IB.

-Akhil

> Thanks,
> Neil
> 
>>
>> -Akhil.
>>
>>> +    struct dev_pm_opp *opp;
>>> +    int i, index = 0;
>>> +    unsigned int bandwidth = 1;
>>> +
>>> +    /*
>>> +     * The OPP table doesn't contain the "off" bandwidth level so we
>>> need to
>>> +     * add 1 to the table size to account for it
>>> +     */
>>> +
>>> +    if (WARN(count + 1 > size,
>>> +        "The GMU bandwidth table is being truncated\n"))
>>> +        count = size - 1;
>>> +
>>> +    /* Set the "off" bandwidth */
>>> +    bandwidths[index++] = 0;
>>> +
>>> +    for (i = 0; i < count; i++) {
>>> +        opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0);
>>> +        if (IS_ERR(opp))
>>> +            break;
>>> +
>>> +        dev_pm_opp_put(opp);
>>> +        bandwidths[index++] = bandwidth++;
>>> +    }
>>> +
>>> +    return index;
>>> +}
>>> +
>>>   static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>>>   {
>>>       struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu,
>>> gmu);
>>>       struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>> +    const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>>       struct msm_gpu *gpu = &adreno_gpu->base;
>>>         int ret = 0;
>>> @@ -1472,6 +1638,14 @@ static int a6xx_gmu_pwrlevels_probe(struct
>>> a6xx_gmu *gmu)
>>>         gmu->current_perf_index = gmu->nr_gpu_freqs - 1;
>>>   +    /*
>>> +     * The GMU also handles GPU Interconnect Votes so build a list
>>> +     * of DDR bandwidths from the GPU OPP table
>>> +     */
>>> +    if (info->bcms)
>>> +        gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev,
>>> +            gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table));
>>> +
>>>       /* Build the list of RPMh votes that we'll send to the GMU */
>>>       return a6xx_gmu_rpmh_votes_init(gmu);
>>>   }
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/
>>> msm/adreno/a6xx_gmu.h
>>> index
>>> 88f18ea6a38a08b5b171709e5020010947a5d347..bdfc106cb3a578c90d7cd84f7d4fe228d761a994 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>> @@ -21,6 +21,15 @@ struct a6xx_gmu_bo {
>>>     #define GMU_MAX_GX_FREQS    16
>>>   #define GMU_MAX_CX_FREQS    4
>>> +#define GMU_MAX_BCMS        3
>>> +
>>> +struct a6xx_bcm {
>>> +    char *name;
>>> +    unsigned int buswidth;
>>> +    bool fixed;
>>> +    unsigned int perfmode;
>>> +    unsigned int perfmode_bw;
>>> +};
>>>     /*
>>>    * These define the different GMU wake up options - these define
>>> how both the
>>> @@ -85,6 +94,11 @@ struct a6xx_gmu {
>>>       unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
>>>       u32 gx_arc_votes[GMU_MAX_GX_FREQS];
>>>   +    int nr_gpu_bws;
>>> +    unsigned long gpu_bw_table[GMU_MAX_GX_FREQS];
>>> +    u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS];
>>> +    u16 gpu_ab_votes[GMU_MAX_GX_FREQS];
>>> +
>>>       int nr_gmu_freqs;
>>>       unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
>>>       u32 cx_arc_votes[GMU_MAX_CX_FREQS];
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/
>>> msm/adreno/a6xx_gpu.h
>>> index
>>> 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>> @@ -44,6 +44,7 @@ struct a6xx_info {
>>>       u32 gmu_chipid;
>>>       u32 gmu_cgc_mode;
>>>       u32 prim_fifo_threshold;
>>> +    const struct a6xx_bcm *bcms;
>>>   };
>>>     struct a6xx_gpu {
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h b/drivers/gpu/drm/
>>> msm/adreno/a6xx_hfi.h
>>> index
>>> 528110169398f69f16443a29a1594d19c36fb595..52ba4a07d7b9a709289acd244a751ace9bdaab5d 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>> @@ -173,6 +173,11 @@ struct a6xx_hfi_gx_bw_perf_vote_cmd {
>>>       u32 bw;
>>>   };
>>>   +#define AB_VOTE_MASK        GENMASK(31, 16)
>>> +#define MAX_AB_VOTE        (FIELD_MAX(AB_VOTE_MASK) - 1)
>>> +#define AB_VOTE(vote)        FIELD_PREP(AB_VOTE_MASK, (vote))
>>> +#define AB_VOTE_ENABLE        BIT(8)
>>> +
>>>   #define HFI_H2F_MSG_PREPARE_SLUMBER 33
>>>     struct a6xx_hfi_prep_slumber_cmd {
>>>
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
  2024-12-04 15:35       ` Neil Armstrong
@ 2024-12-04 19:15         ` Akhil P Oommen
  0 siblings, 0 replies; 26+ messages in thread
From: Akhil P Oommen @ 2024-12-04 19:15 UTC (permalink / raw)
  To: neil.armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Simona Vetter, Bjorn Andersson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, devicetree

On 12/4/2024 9:05 PM, Neil Armstrong wrote:
> On 02/12/2024 09:46, Neil Armstrong wrote:
>> On 30/11/2024 22:49, Akhil P Oommen wrote:
>>> On 11/28/2024 3:55 PM, Neil Armstrong wrote:
>>>> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
>>>> the Frequency and Power Domain level, but by default we leave the
>>>> OPP core scale the interconnect ddr path.
>>>>
>>>> While scaling via the interconnect path was sufficient, newer GPUs
>>>> like the A750 requires specific vote paremeters and bandwidth to
>>>> achieve full functionality.
>>>>
>>>> In order to calculate vote values used by the GPU Management
>>>> Unit (GMU), we need to parse all the possible OPP Bandwidths and
>>>> create a vote value to be sent to the appropriate Bus Control
>>>> Modules (BCMs) declared in the GPU info struct.
>>>>
>>>> This vote value is called IB, while on the othe side the GMU also
>>>> takes another vote called AB which is a 16bit quantized value
>>>> of the bandwidth against the maximum supported bandwidth.
>>>>
>>>> The vote array will then be used to dynamically generate the GMU
>>>> bw_table sent during the GMU power-up.
>>>>
>>>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>>>> ---
>>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 174 ++++++++++++++++++++++
>>>> ++++++++++++
>>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  14 +++
>>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>>>>   drivers/gpu/drm/msm/adreno/a6xx_hfi.h |   5 +
>>>>   4 files changed, 194 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/
>>>> drm/msm/adreno/a6xx_gmu.c
>>>> index
>>>> 14db7376c712d19446b38152e480bd5a1e0a5198..ee2010a01186721dd377f1655fcf05ddaff77131 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> @@ -9,6 +9,7 @@
>>>>   #include <linux/pm_domain.h>
>>>>   #include <linux/pm_opp.h>
>>>>   #include <soc/qcom/cmd-db.h>
>>>> +#include <soc/qcom/tcs.h>
>>>>   #include <drm/drm_gem.h>
>>>>   #include "a6xx_gpu.h"
>>>> @@ -1287,6 +1288,131 @@ static int a6xx_gmu_memory_probe(struct
>>>> a6xx_gmu *gmu)
>>>>       return 0;
>>>>   }
>>>> +/**
>>>> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock
>>>> Manager (BCM)
>>>> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
>>>> + * @width: multiplier used to convert bytes/sec bw value to an RPMh
>>>> msg
>>>> + * @vcd: virtual clock domain that this bcm belongs to
>>>> + * @reserved: reserved field
>>>> + */
>>>> +struct bcm_db {
>>>> +    __le32 unit;
>>>> +    __le16 width;
>>>> +    u8 vcd;
>>>> +    u8 reserved;
>>>> +};
>>>> +
>>>> +static u64 bcm_div(u64 num, u32 base)
>>>> +{
>>>> +    /* Ensure that small votes aren't lost. */
>>>> +    if (num && num < base)
>>>> +        return 1;
>>>> +
>>>> +    do_div(num, base);
>>>> +
>>>> +    return num;
>>>> +}
>>>> +
>>>> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
>>>> +                       struct a6xx_gmu *gmu)
>>>> +{
>>>> +    const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
>>>> +    unsigned int bcm_index, bw_index, bcm_count = 0;
>>>> +
>>>> +    if (!info->bcms)
>>>> +        return 0;
>>>> +
>>>> +    /* Retrieve BCM data from cmd-db */
>>>> +    for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
>>>> +        size_t count;
>>>> +
>>>> +        /* Stop at first unconfigured bcm */
>>>> +        if (!info->bcms[bcm_index].name)
>>>> +            break;
>>>> +
>>>> +        bcm_data[bcm_index] = cmd_db_read_aux_data(
>>>> +                        info->bcms[bcm_index].name,
>>>> +                        &count);
>>>> +        if (IS_ERR(bcm_data[bcm_index]))
>>>> +            return PTR_ERR(bcm_data[bcm_index]);
>>>> +
>>>> +        if (!count)
>>>> +            return -EINVAL;
>>>> +
>>>> +        ++bcm_count;
>>>> +    }
>>>> +
>>>> +    /* Generate BCM votes values for each bandwidth & BCM */
>>>> +    for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>>>> +        u32 *data = gmu->gpu_ib_votes[bw_index];
>>>> +        u32 bw = gmu->gpu_bw_table[bw_index];
>>>> +
>>>> +        /* Calculations loosely copied from bcm_aggregate() &
>>>> tcs_cmd_gen() */
>>>> +        for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
>>>> +            bool commit = false;
>>>> +            u64 peak, vote;
>>>> +            u16 width;
>>>> +            u32 unit;
>>>> +
>>>> +            /* Skip unconfigured BCM */
>>>> +            if (!bcm_data[bcm_index])
>>>> +                continue;
>>>> +
>>>> +            if (bcm_index == bcm_count - 1 ||
>>>> +                (bcm_data[bcm_index + 1] &&
>>>> +                 bcm_data[bcm_index]->vcd != bcm_data[bcm_index +
>>>> 1]->vcd))
>>>> +                commit = true;
>>>> +
>>>> +            if (!bw) {
>>>> +                data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            if (info->bcms[bcm_index].fixed) {
>>>> +                u32 perfmode = 0;
>>>> +
>>>> +                if (bw >= info->bcms[bcm_index].perfmode_bw)
>>>> +                    perfmode = info->bcms[bcm_index].perfmode;
>>>> +
>>>> +                data[bcm_index] = BCM_TCS_CMD(commit, true, 0,
>>>> perfmode);
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            /* Multiply the bandwidth by the width of the
>>>> connection */
>>>> +            width = le16_to_cpu(bcm_data[bcm_index]->width);
>>>> +            peak = bcm_div((u64)bw * width, info-
>>>> >bcms[bcm_index].buswidth);
>>>> +
>>>> +            /* Input bandwidth value is in KBps, scale the value to
>>>> BCM unit */
>>>> +            unit = le32_to_cpu(bcm_data[bcm_index]->unit);
>>>> +            vote = bcm_div(peak * 1000ULL, unit);
>>>> +
>>>> +            if (vote > BCM_TCS_CMD_VOTE_MASK)
>>>> +                vote = BCM_TCS_CMD_VOTE_MASK;
>>>
>>> use clamp()?
>>
>> Yep, I think I could replace bcm_div with clamp
>>
>>>
>>>> +
>>>> +            data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /* Generate AB votes which are a quantitized bandwidth value */
>>>> +    for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
>>>> +        u64 tmp;
>>>> +
>>>> +        /*
>>>> +         * The AB vote consists of a 16 bit wide quantized level
>>>> +         * against the maximum supported bandwidth.
>>>> +         * Quantization can be calculated as below:
>>>> +         * vote = (bandwidth * 2^16) / max bandwidth
>>>> +         */
>>>> +        tmp = gmu->gpu_bw_table[bw_index] * MAX_AB_VOTE;
>>>> +
>>>> +        /* Divide by the maximum bandwidth to get a quantized value */
>>>> +        gmu->gpu_ab_votes[bw_index] =
>>>> +            bcm_div(tmp, gmu->gpu_bw_table[gmu->nr_gpu_bws - 1]);
>>>> +    }
>>>
>>> So I suppose you are trying to vote AB equal to IB. Aggregation logic
>>> for both are different. So this will make DDR scale very aggressively. A
>>> more reasonable approach would be to vote a % of IB vote (25%?). Ideally
>>> we should measure GPU's bandwidth usage and vote that if you are really
>>> concerned about stability issues. IB is just a floor vote, GPU can
>>> generate way higher traffic.
>>
>> I think this should be optimized better in a different patchset, so I
>> would
>> like to make the simplest vote possible here to retain functionnality.
>>
>> So if I understand I should divide this vote value by 4 ? Downstram
>> uses 25% by default
>> when no AB was calculated, but what does that mean exactly ?
>>
>> Is there a counter somewhere to measure the GPU traffic ?
> 
> 
> What if we also specified opp-avg-kBps for each OPP, define it to the
> downstream "qcom,bus-min"
> and use this value for the AB vote ? and use 25% as fallback like
> downstream.

Fixed AB vote for each opp is a temporary measure. So I feel that we
should not mess with DT for this.

-Akhil.

> 
> Neil
> 
>>
>>>
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>   /* Return the 'arc-level' for the given frequency */
>>>>   static unsigned int a6xx_gmu_get_arc_level(struct device *dev,
>>>>                          unsigned long freq)
>>>> @@ -1390,12 +1516,15 @@ static int
>>>> a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes,
>>>>    * The GMU votes with the RPMh for itself and on behalf of the GPU
>>>> but we need
>>>>    * to construct the list of votes on the CPU and send it over.
>>>> Query the RPMh
>>>>    * voltage levels and build the votes
>>>> + * The GMU can also vote for DDR interconnects, use the OPP
>>>> bandwidth entries
>>>> + * and BCM parameters to build the votes.
>>>>    */
>>>>   static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu)
>>>>   {
>>>>       struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu,
>>>> gmu);
>>>>       struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>> +    const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>>>       struct msm_gpu *gpu = &adreno_gpu->base;
>>>>       int ret;
>>>> @@ -1407,6 +1536,10 @@ static int a6xx_gmu_rpmh_votes_init(struct
>>>> a6xx_gmu *gmu)
>>>>       ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes,
>>>>           gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl");
>>>> +    /* Build the interconnect votes */
>>>> +    if (info->bcms && gmu->nr_gpu_bws > 1)
>>>> +        ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu);
>>>> +
>>>>       return ret;
>>>>   }
>>>> @@ -1442,10 +1575,43 @@ static int a6xx_gmu_build_freq_table(struct
>>>> device *dev, unsigned long *freqs,
>>>>       return index;
>>>>   }
>>>> +static int a6xx_gmu_build_bw_table(struct device *dev, unsigned
>>>> long *bandwidths,
>>>> +        u32 size)
>>>> +{
>>>> +    int count = dev_pm_opp_get_opp_count(dev);
>>>
>>> I am less concerned about this now since you are not voting real AB BW.
>>
>> Sorry I don't understand
>>
>> Thanks,
>> Neil
>>
>>>
>>> -Akhil.
>>>
>>>> +    struct dev_pm_opp *opp;
>>>> +    int i, index = 0;
>>>> +    unsigned int bandwidth = 1;
>>>> +
>>>> +    /*
>>>> +     * The OPP table doesn't contain the "off" bandwidth level so
>>>> we need to
>>>> +     * add 1 to the table size to account for it
>>>> +     */
>>>> +
>>>> +    if (WARN(count + 1 > size,
>>>> +        "The GMU bandwidth table is being truncated\n"))
>>>> +        count = size - 1;
>>>> +
>>>> +    /* Set the "off" bandwidth */
>>>> +    bandwidths[index++] = 0;
>>>> +
>>>> +    for (i = 0; i < count; i++) {
>>>> +        opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0);
>>>> +        if (IS_ERR(opp))
>>>> +            break;
>>>> +
>>>> +        dev_pm_opp_put(opp);
>>>> +        bandwidths[index++] = bandwidth++;
>>>> +    }
>>>> +
>>>> +    return index;
>>>> +}
>>>> +
>>>>   static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu)
>>>>   {
>>>>       struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu,
>>>> gmu);
>>>>       struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>> +    const struct a6xx_info *info = adreno_gpu->info->a6xx;
>>>>       struct msm_gpu *gpu = &adreno_gpu->base;
>>>>       int ret = 0;
>>>> @@ -1472,6 +1638,14 @@ static int a6xx_gmu_pwrlevels_probe(struct
>>>> a6xx_gmu *gmu)
>>>>       gmu->current_perf_index = gmu->nr_gpu_freqs - 1;
>>>> +    /*
>>>> +     * The GMU also handles GPU Interconnect Votes so build a list
>>>> +     * of DDR bandwidths from the GPU OPP table
>>>> +     */
>>>> +    if (info->bcms)
>>>> +        gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev,
>>>> +            gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table));
>>>> +
>>>>       /* Build the list of RPMh votes that we'll send to the GMU */
>>>>       return a6xx_gmu_rpmh_votes_init(gmu);
>>>>   }
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/
>>>> drm/msm/adreno/a6xx_gmu.h
>>>> index
>>>> 88f18ea6a38a08b5b171709e5020010947a5d347..bdfc106cb3a578c90d7cd84f7d4fe228d761a994 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
>>>> @@ -21,6 +21,15 @@ struct a6xx_gmu_bo {
>>>>   #define GMU_MAX_GX_FREQS    16
>>>>   #define GMU_MAX_CX_FREQS    4
>>>> +#define GMU_MAX_BCMS        3
>>>> +
>>>> +struct a6xx_bcm {
>>>> +    char *name;
>>>> +    unsigned int buswidth;
>>>> +    bool fixed;
>>>> +    unsigned int perfmode;
>>>> +    unsigned int perfmode_bw;
>>>> +};
>>>>   /*
>>>>    * These define the different GMU wake up options - these define
>>>> how both the
>>>> @@ -85,6 +94,11 @@ struct a6xx_gmu {
>>>>       unsigned long gpu_freqs[GMU_MAX_GX_FREQS];
>>>>       u32 gx_arc_votes[GMU_MAX_GX_FREQS];
>>>> +    int nr_gpu_bws;
>>>> +    unsigned long gpu_bw_table[GMU_MAX_GX_FREQS];
>>>> +    u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS];
>>>> +    u16 gpu_ab_votes[GMU_MAX_GX_FREQS];
>>>> +
>>>>       int nr_gmu_freqs;
>>>>       unsigned long gmu_freqs[GMU_MAX_CX_FREQS];
>>>>       u32 cx_arc_votes[GMU_MAX_CX_FREQS];
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/
>>>> drm/msm/adreno/a6xx_gpu.h
>>>> index
>>>> 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> @@ -44,6 +44,7 @@ struct a6xx_info {
>>>>       u32 gmu_chipid;
>>>>       u32 gmu_cgc_mode;
>>>>       u32 prim_fifo_threshold;
>>>> +    const struct a6xx_bcm *bcms;
>>>>   };
>>>>   struct a6xx_gpu {
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h b/drivers/gpu/
>>>> drm/msm/adreno/a6xx_hfi.h
>>>> index
>>>> 528110169398f69f16443a29a1594d19c36fb595..52ba4a07d7b9a709289acd244a751ace9bdaab5d 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.h
>>>> @@ -173,6 +173,11 @@ struct a6xx_hfi_gx_bw_perf_vote_cmd {
>>>>       u32 bw;
>>>>   };
>>>> +#define AB_VOTE_MASK        GENMASK(31, 16)
>>>> +#define MAX_AB_VOTE        (FIELD_MAX(AB_VOTE_MASK) - 1)
>>>> +#define AB_VOTE(vote)        FIELD_PREP(AB_VOTE_MASK, (vote))
>>>> +#define AB_VOTE_ENABLE        BIT(8)
>>>> +
>>>>   #define HFI_H2F_MSG_PREPARE_SLUMBER 33
>>>>   struct a6xx_hfi_prep_slumber_cmd {
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-12-04 19:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 10:25 [PATCH v3 0/7] drm/msm: adreno: add support for DDR bandwidth scaling via GMU Neil Armstrong
2024-11-28 10:25 ` [PATCH v3 1/7] drm/msm: adreno: add defines for gpu & gmu frequency table sizes Neil Armstrong
2024-11-28 13:24   ` Dmitry Baryshkov
2024-11-30 20:39   ` Akhil P Oommen
2024-11-28 10:25 ` [PATCH v3 2/7] drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU Neil Armstrong
2024-11-29 15:21   ` Konrad Dybcio
2024-12-02  8:41     ` Neil Armstrong
2024-11-30 21:49   ` Akhil P Oommen
2024-12-02  8:46     ` Neil Armstrong
2024-12-04 15:35       ` Neil Armstrong
2024-12-04 19:15         ` Akhil P Oommen
2024-12-04 18:43       ` Akhil P Oommen
2024-11-28 10:25 ` [PATCH v3 3/7] drm/msm: adreno: dynamically generate GMU bw table Neil Armstrong
2024-11-29 16:56   ` Konrad Dybcio
2024-11-28 10:25 ` [PATCH v3 4/7] drm/msm: adreno: find bandwidth index of OPP and set it along freq index Neil Armstrong
2024-11-29 15:33   ` Konrad Dybcio
2024-11-30 22:02     ` Akhil P Oommen
2024-11-28 10:25 ` [PATCH v3 5/7] drm/msm: adreno: enable GMU bandwidth for A740 and A750 Neil Armstrong
2024-11-28 13:30   ` Dmitry Baryshkov
2024-11-29 15:25   ` Konrad Dybcio
2024-12-02  8:47     ` Neil Armstrong
2024-11-28 10:25 ` [PATCH v3 6/7] arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU Neil Armstrong
2024-11-28 13:26   ` Dmitry Baryshkov
2024-11-28 14:16     ` Neil Armstrong
2024-11-28 10:25 ` [PATCH v3 7/7] arm64: qcom: dts: sm8650: " Neil Armstrong
2024-11-28 13:26   ` Dmitry Baryshkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox