[PATCH v4 00/11] Preemption support for A7XX

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 00/11] Preemption support for A7XX
@ 2024-09-17 11:14 Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 01/11] drm/msm: Fix bv_fence being used as bv_rptr Antonino Maniscalco
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Akhil P Oommen, Neil Armstrong,
	Sharat Masetty

This series implements preemption for A7XX targets, which allows the GPU to
switch to an higher priority ring when work is pushed to it, reducing latency
for high priority submissions.

This series enables L1 preemption with skip_save_restore which requires
the following userspace patches to function:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544

A flag is added to `msm_submitqueue_create` to only allow submissions
from compatible userspace to be preempted, therefore maintaining
compatibility.

Preemption is currently only enabled by default on A750, it can be
enabled on other targets through the `enable_preemption` module
parameter. This is because more testing is required on other targets.

For testing on other HW it is sufficient to set that parameter to a
value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
allows to run any application as high priority therefore preempting
submissions from other applications.

The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
added in this series can be used to observe preemption's behavior as
well as measuring preemption latency.

Some commits from this series are based on a previous series to enable
preemption on A6XX targets:

https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
Changes in v4:
- Added missing register in pwrup list 
- Removed and rearrange barriers
- Renamed `skip_inline_wptr` to `restore_wptr`
- Track ctx seqno per ring
- Removed secure preempt context
- NOP out postamble to disable it instantly
- Only emit pwrup reglist once
- Document bv_rptr_addr
- Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
- Set name on preempt record buffer
- Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com

Changes in v3:
- Added documentation about preemption
- Use quirks to determine which target supports preemption
- Add a module parameter to force disabling or enabling preemption
- Clear postamble when profiling
- Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
- Make preemption records MAP_PRIV
- Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
  anymore
- Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com

Changes in v2:
- Added preept_record_size for X185 in PATCH 3/7
- Added patches to reset perf counters
- Dropped unused defines
- Dropped unused variable (fixes warning)
- Only enable preemption on a750
- Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
- Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
- Added Neil's Tested-By tags
- Added explanation for UAPI changes in commit message
- Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com

---
Antonino Maniscalco (11):
      drm/msm: Fix bv_fence being used as bv_rptr
      drm/msm/A6XX: Track current_ctx_seqno per ring
      drm/msm: Add a `preempt_record_size` field
      drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
      drm/msm/A6xx: Implement preemption for A7XX targets
      drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
      drm/msm/A6xx: Use posamble to reset counters on preemption
      drm/msm/A6xx: Add traces for preemption
      drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
      drm/msm/A6xx: Enable preemption for A750
      Documentation: document adreno preemption

 Documentation/gpu/msm-preemption.rst               |  98 +++++
 drivers/gpu/drm/msm/Makefile                       |   1 +
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
 drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
 drivers/gpu/drm/msm/msm_drv.c                      |   4 +
 drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
 drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
 drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
 drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
 drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
 drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
 .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
 include/uapi/drm/msm_drm.h                         |   5 +-
 20 files changed, 1117 insertions(+), 66 deletions(-)
---
base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
change-id: 20240815-preemption-a750-t-fcee9a844b39

Best regards,
-- 
Antonino Maniscalco <antomani103@gmail.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 01/11] drm/msm: Fix bv_fence being used as bv_rptr
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 02/11] drm/msm/A6XX: Track current_ctx_seqno per ring Antonino Maniscalco
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Akhil P Oommen, Neil Armstrong

The bv_fence field of rbmemptrs was being used incorrectly as the BV
rptr shadow pointer in some places.

Add a bv_rptr field and change the code to use that instead.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index bcaec86ac67a..32a4faa93d7f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1132,7 +1132,7 @@ static int hw_init(struct msm_gpu *gpu)
 	/* ..which means "always" on A7xx, also for BV shadow */
 	if (adreno_is_a7xx(adreno_gpu)) {
 		gpu_write64(gpu, REG_A7XX_CP_BV_RB_RPTR_ADDR,
-			    rbmemptr(gpu->rb[0], bv_fence));
+			    rbmemptr(gpu->rb[0], bv_rptr));
 	}
 
 	/* Always come up on rb 0 */
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 0d6beb8cd39a..40791b2ade46 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
 	volatile uint32_t rptr;
 	volatile uint32_t fence;
 	/* Introduced on A7xx */
+	volatile uint32_t bv_rptr;
 	volatile uint32_t bv_fence;
 
 	volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 02/11] drm/msm/A6XX: Track current_ctx_seqno per ring
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 01/11] drm/msm: Fix bv_fence being used as bv_rptr Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 03/11] drm/msm: Add a `preempt_record_size` field Antonino Maniscalco
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco

With preemption it is not enough to track the current_ctx_seqno globally
as execution might switch between rings.

This is especially problematic when current_ctx_seqno is used to
determine whether a page table switch is necessary as it might lead to
security bugs.

Track current context per ring.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c |  2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c |  2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c |  2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  6 +++---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 10 ++++++----
 drivers/gpu/drm/msm/msm_gpu.c         |  2 +-
 drivers/gpu/drm/msm/msm_gpu.h         | 11 -----------
 drivers/gpu/drm/msm/msm_ringbuffer.h  | 10 ++++++++++
 8 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 0dc255ddf5ce..379a3d346c30 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -22,7 +22,7 @@ static void a2xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
 			/* ignore if there has not been a ctx switch: */
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 5273dc849838..945fe64f835c 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -40,7 +40,7 @@ static void a3xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
 			/* ignore if there has not been a ctx switch: */
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 8b4cdf95f445..50c490b492f0 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -34,7 +34,7 @@ static void a4xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
 			/* ignore if there has not been a ctx switch: */
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c0b5373e90d7..80b441fe8e3a 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -75,7 +75,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
@@ -129,7 +129,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	unsigned int i, ibs = 0;
 
 	if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
-		gpu->cur_ctx_seqno = 0;
+		ring->cur_ctx_seqno = 0;
 		a5xx_submit_in_rb(gpu, submit);
 		return;
 	}
@@ -164,7 +164,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 32a4faa93d7f..6e065500b64d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -109,7 +109,7 @@ static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
 	u32 asid;
 	u64 memptr = rbmemptr(ring, ttbr0);
 
-	if (ctx->seqno == a6xx_gpu->base.base.cur_ctx_seqno)
+	if (ctx->seqno == ring->cur_ctx_seqno)
 		return;
 
 	if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
@@ -219,7 +219,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
@@ -305,7 +305,7 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (gpu->cur_ctx_seqno == submit->queue->ctx->seqno)
+			if (ring->cur_ctx_seqno == submit->queue->ctx->seqno)
 				break;
 			fallthrough;
 		case MSM_SUBMIT_CMD_BUF:
@@ -843,6 +843,7 @@ static int hw_init(struct msm_gpu *gpu)
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
 	u64 gmem_range_min;
+	unsigned int i;
 	int ret;
 
 	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
@@ -1138,7 +1139,8 @@ static int hw_init(struct msm_gpu *gpu)
 	/* Always come up on rb 0 */
 	a6xx_gpu->cur_ring = gpu->rb[0];
 
-	gpu->cur_ctx_seqno = 0;
+	for (i = 0; i < gpu->nr_rings; i++)
+		gpu->rb[i]->cur_ctx_seqno = 0;
 
 	/* Enable the SQE_to start the CP engine */
 	gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 1);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3666b42b4ecd..c063b3896dc1 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -783,7 +783,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	mutex_unlock(&gpu->active_lock);
 
 	gpu->funcs->submit(gpu, submit);
-	gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
+	submit->ring->cur_ctx_seqno = submit->queue->ctx->seqno;
 
 	pm_runtime_put(&gpu->pdev->dev);
 	hangcheck_timer_reset(gpu);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f02bb9956be..7cabc8480d7c 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -193,17 +193,6 @@ struct msm_gpu {
 	 */
 	refcount_t sysprof_active;
 
-	/**
-	 * cur_ctx_seqno:
-	 *
-	 * The ctx->seqno value of the last context to submit rendering,
-	 * and the one with current pgtables installed (for generations
-	 * that support per-context pgtables).  Tracked by seqno rather
-	 * than pointer value to avoid dangling pointers, and cases where
-	 * a ctx can be freed and a new one created with the same address.
-	 */
-	int cur_ctx_seqno;
-
 	/**
 	 * lock:
 	 *
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 40791b2ade46..174f83137a49 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -100,6 +100,16 @@ struct msm_ringbuffer {
 	 * preemption.  Can be aquired from irq context.
 	 */
 	spinlock_t preempt_lock;
+
+	/**
+	 * cur_ctx_seqno:
+	 *
+	 * The ctx->seqno value of the last context to submit to this ring
+	 * Tracked by seqno rather than pointer value to avoid dangling
+	 * pointers, and cases where a ctx can be freed and a new one created
+	 * with the same address.
+	 */
+	int cur_ctx_seqno;
 };
 
 struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 03/11] drm/msm: Add a `preempt_record_size` field
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 01/11] drm/msm: Fix bv_fence being used as bv_rptr Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 02/11] drm/msm/A6XX: Track current_ctx_seqno per ring Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 04/11] drm/msm: Add CONTEXT_SWITCH_CNTL bitfields Antonino Maniscalco
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Akhil P Oommen, Neil Armstrong

Adds a field to `adreno_info` to store the GPU specific preempt record
size.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 4 ++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 68ba9aed5506..316f23ca9167 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1190,6 +1190,7 @@ static const struct adreno_info a7xx_gpus[] = {
 			.protect = &a730_protect,
 		},
 		.address_space_size = SZ_16G,
+		.preempt_record_size = 2860 * SZ_1K,
 	}, {
 		.chip_ids = ADRENO_CHIP_IDS(0x43050a01), /* "C510v2" */
 		.family = ADRENO_7XX_GEN2,
@@ -1209,6 +1210,7 @@ static const struct adreno_info a7xx_gpus[] = {
 			.gmu_chipid = 0x7020100,
 		},
 		.address_space_size = SZ_16G,
+		.preempt_record_size = 4192 * SZ_1K,
 	}, {
 		.chip_ids = ADRENO_CHIP_IDS(0x43050c01), /* "C512v2" */
 		.family = ADRENO_7XX_GEN2,
@@ -1227,6 +1229,7 @@ static const struct adreno_info a7xx_gpus[] = {
 			.gmu_chipid = 0x7050001,
 		},
 		.address_space_size = SZ_256G,
+		.preempt_record_size = 4192 * SZ_1K,
 	}, {
 		.chip_ids = ADRENO_CHIP_IDS(0x43051401), /* "C520v2" */
 		.family = ADRENO_7XX_GEN3,
@@ -1245,6 +1248,7 @@ static const struct adreno_info a7xx_gpus[] = {
 			.gmu_chipid = 0x7090100,
 		},
 		.address_space_size = SZ_16G,
+		.preempt_record_size = 3572 * SZ_1K,
 	}
 };
 DECLARE_ADRENO_GPULIST(a7xx);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 1ab523a163a0..6b1888280a83 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -111,6 +111,7 @@ struct adreno_info {
 	 * {SHRT_MAX, 0} sentinal.
 	 */
 	struct adreno_speedbin *speedbins;
+	u64 preempt_record_size;
 };
 
 #define ADRENO_CHIP_IDS(tbl...) (uint32_t[]) { tbl, 0 }

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 04/11] drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (2 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 03/11] drm/msm: Add a `preempt_record_size` field Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets Antonino Maniscalco
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco

Add missing bitfields to CONTEXT_SWITCH_CNTL in a6xx.xml.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
 drivers/gpu/drm/msm/registers/adreno/a6xx.xml | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/registers/adreno/a6xx.xml b/drivers/gpu/drm/msm/registers/adreno/a6xx.xml
index 2dfe6913ab4f..fd31d1d7a11e 100644
--- a/drivers/gpu/drm/msm/registers/adreno/a6xx.xml
+++ b/drivers/gpu/drm/msm/registers/adreno/a6xx.xml
@@ -1337,7 +1337,12 @@ to upconvert to 32b float internally?
 		<reg32 offset="0x0" name="REG" type="a6x_cp_protect"/>
 	</array>
 
-	<reg32 offset="0x08A0" name="CP_CONTEXT_SWITCH_CNTL"/>
+	<reg32 offset="0x08A0" name="CP_CONTEXT_SWITCH_CNTL">
+		<bitfield name="STOP" pos="0" type="boolean"/>
+		<bitfield name="LEVEL" low="6" high="7"/>
+		<bitfield name="USES_GMEM" pos="8" type="boolean"/>
+		<bitfield name="SKIP_SAVE_RESTORE" pos="9" type="boolean"/>
+	</reg32>
 	<reg64 offset="0x08A1" name="CP_CONTEXT_SWITCH_SMMU_INFO"/>
 	<reg64 offset="0x08A3" name="CP_CONTEXT_SWITCH_PRIV_NON_SECURE_RESTORE_ADDR"/>
 	<reg64 offset="0x08A5" name="CP_CONTEXT_SWITCH_PRIV_SECURE_RESTORE_ADDR"/>

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (3 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 04/11] drm/msm: Add CONTEXT_SWITCH_CNTL bitfields Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-20 16:37   ` Akhil P Oommen
  2024-09-17 11:14 ` [PATCH v4 06/11] drm/msm/A6xx: Sync relevant adreno_pm4.xml changes Antonino Maniscalco
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Sharat Masetty, Neil Armstrong

This patch implements preemption feature for A6xx targets, this allows
the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
hardware as such supports multiple levels of preemption granularities,
ranging from coarse grained(ringbuffer level) to a more fine grained
such as draw-call level or a bin boundary level preemption. This patch
enables the basic preemption level, with more fine grained preemption
support to follow.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
---
 drivers/gpu/drm/msm/Makefile              |   1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 283 +++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     | 168 +++++++++++++
 drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 377 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   7 +
 5 files changed, 825 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index f5e2838c6a76..32e915109a59 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -23,6 +23,7 @@ adreno-y := \
 	adreno/a6xx_gpu.o \
 	adreno/a6xx_gmu.o \
 	adreno/a6xx_hfi.o \
+	adreno/a6xx_preempt.o \
 
 adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 6e065500b64d..355a3e210335 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -16,6 +16,84 @@
 
 #define GPU_PAS_ID 13
 
+/* IFPC & Preemption static powerup restore list */
+static const uint32_t a7xx_pwrup_reglist[] = {
+	REG_A6XX_UCHE_TRAP_BASE,
+	REG_A6XX_UCHE_TRAP_BASE + 1,
+	REG_A6XX_UCHE_WRITE_THRU_BASE,
+	REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
+	REG_A6XX_UCHE_GMEM_RANGE_MIN,
+	REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
+	REG_A6XX_UCHE_GMEM_RANGE_MAX,
+	REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
+	REG_A6XX_UCHE_CACHE_WAYS,
+	REG_A6XX_UCHE_MODE_CNTL,
+	REG_A6XX_RB_NC_MODE_CNTL,
+	REG_A6XX_RB_CMP_DBG_ECO_CNTL,
+	REG_A7XX_GRAS_NC_MODE_CNTL,
+	REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
+	REG_A6XX_UCHE_GBIF_GX_CONFIG,
+	REG_A6XX_UCHE_CLIENT_PF,
+	REG_A6XX_TPL1_DBG_ECO_CNTL1,
+};
+
+static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
+	REG_A6XX_TPL1_NC_MODE_CNTL,
+	REG_A6XX_SP_NC_MODE_CNTL,
+	REG_A6XX_CP_DBG_ECO_CNTL,
+	REG_A6XX_CP_PROTECT_CNTL,
+	REG_A6XX_CP_PROTECT(0),
+	REG_A6XX_CP_PROTECT(1),
+	REG_A6XX_CP_PROTECT(2),
+	REG_A6XX_CP_PROTECT(3),
+	REG_A6XX_CP_PROTECT(4),
+	REG_A6XX_CP_PROTECT(5),
+	REG_A6XX_CP_PROTECT(6),
+	REG_A6XX_CP_PROTECT(7),
+	REG_A6XX_CP_PROTECT(8),
+	REG_A6XX_CP_PROTECT(9),
+	REG_A6XX_CP_PROTECT(10),
+	REG_A6XX_CP_PROTECT(11),
+	REG_A6XX_CP_PROTECT(12),
+	REG_A6XX_CP_PROTECT(13),
+	REG_A6XX_CP_PROTECT(14),
+	REG_A6XX_CP_PROTECT(15),
+	REG_A6XX_CP_PROTECT(16),
+	REG_A6XX_CP_PROTECT(17),
+	REG_A6XX_CP_PROTECT(18),
+	REG_A6XX_CP_PROTECT(19),
+	REG_A6XX_CP_PROTECT(20),
+	REG_A6XX_CP_PROTECT(21),
+	REG_A6XX_CP_PROTECT(22),
+	REG_A6XX_CP_PROTECT(23),
+	REG_A6XX_CP_PROTECT(24),
+	REG_A6XX_CP_PROTECT(25),
+	REG_A6XX_CP_PROTECT(26),
+	REG_A6XX_CP_PROTECT(27),
+	REG_A6XX_CP_PROTECT(28),
+	REG_A6XX_CP_PROTECT(29),
+	REG_A6XX_CP_PROTECT(30),
+	REG_A6XX_CP_PROTECT(31),
+	REG_A6XX_CP_PROTECT(32),
+	REG_A6XX_CP_PROTECT(33),
+	REG_A6XX_CP_PROTECT(34),
+	REG_A6XX_CP_PROTECT(35),
+	REG_A6XX_CP_PROTECT(36),
+	REG_A6XX_CP_PROTECT(37),
+	REG_A6XX_CP_PROTECT(38),
+	REG_A6XX_CP_PROTECT(39),
+	REG_A6XX_CP_PROTECT(40),
+	REG_A6XX_CP_PROTECT(41),
+	REG_A6XX_CP_PROTECT(42),
+	REG_A6XX_CP_PROTECT(43),
+	REG_A6XX_CP_PROTECT(44),
+	REG_A6XX_CP_PROTECT(45),
+	REG_A6XX_CP_PROTECT(46),
+	REG_A6XX_CP_PROTECT(47),
+	REG_A6XX_CP_AHB_CNTL,
+};
+
+
 static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -68,6 +146,8 @@ static void update_shadow_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 
 static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 	uint32_t wptr;
 	unsigned long flags;
 
@@ -81,12 +161,17 @@ static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	/* Make sure to wrap wptr if we need to */
 	wptr = get_wptr(ring);
 
-	spin_unlock_irqrestore(&ring->preempt_lock, flags);
-
-	/* Make sure everything is posted before making a decision */
-	mb();
+	/* Update HW if this is the current ring and we are not in preempt*/
+	if (!a6xx_in_preempt(a6xx_gpu)) {
+		if (a6xx_gpu->cur_ring == ring)
+			gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
+		else
+			ring->restore_wptr = true;
+	} else {
+		ring->restore_wptr = true;
+	}
 
-	gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
 }
 
 static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter,
@@ -138,12 +223,14 @@ static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
 
 	/*
 	 * Write the new TTBR0 to the memstore. This is good for debugging.
+	 * Needed for preemption
 	 */
-	OUT_PKT7(ring, CP_MEM_WRITE, 4);
+	OUT_PKT7(ring, CP_MEM_WRITE, 5);
 	OUT_RING(ring, CP_MEM_WRITE_0_ADDR_LO(lower_32_bits(memptr)));
 	OUT_RING(ring, CP_MEM_WRITE_1_ADDR_HI(upper_32_bits(memptr)));
 	OUT_RING(ring, lower_32_bits(ttbr));
-	OUT_RING(ring, (asid << 16) | upper_32_bits(ttbr));
+	OUT_RING(ring, upper_32_bits(ttbr));
+	OUT_RING(ring, ctx->seqno);
 
 	/*
 	 * Sync both threads after switching pagetables and enable BR only
@@ -268,6 +355,34 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	a6xx_flush(gpu, ring);
 }
 
+static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
+		struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
+{
+	OUT_PKT7(ring, CP_SET_PSEUDO_REG, 12);
+
+	OUT_RING(ring, SMMU_INFO);
+	/* don't save SMMU, we write the record from the kernel instead */
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+
+	/* privileged and non secure buffer save */
+	OUT_RING(ring, NON_SECURE_SAVE_ADDR);
+	OUT_RING(ring, lower_32_bits(
+		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE));
+	OUT_RING(ring, upper_32_bits(
+		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE));
+
+	/* user context buffer save, seems to be unnused by fw */
+	OUT_RING(ring, NON_PRIV_SAVE_ADDR);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+
+	OUT_RING(ring, COUNTER);
+	/* seems OK to set to 0 to disable it */
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+}
+
 static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
@@ -285,6 +400,13 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 
 	a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
 
+	/*
+	 * If preemption is enabled, then set the pseudo register for the save
+	 * sequence
+	 */
+	if (gpu->nr_rings > 1)
+		a6xx_emit_set_pseudo_reg(ring, a6xx_gpu, submit->queue);
+
 	get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
 		rbmemptr_stats(ring, index, cpcycles_start));
 	get_stats_counter(ring, REG_A6XX_CP_ALWAYS_ON_COUNTER,
@@ -376,6 +498,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	OUT_RING(ring, upper_32_bits(rbmemptr(ring, bv_fence)));
 	OUT_RING(ring, submit->seqno);
 
+	a6xx_gpu->last_seqno[ring->id] = submit->seqno;
+
 	/* write the ringbuffer timestamp */
 	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
 	OUT_RING(ring, CACHE_CLEAN | CP_EVENT_WRITE_0_IRQ | BIT(27));
@@ -389,10 +513,32 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	OUT_PKT7(ring, CP_SET_MARKER, 1);
 	OUT_RING(ring, 0x100); /* IFPC enable */
 
+	/* If preemption is enabled */
+	if (gpu->nr_rings > 1) {
+		/* Yield the floor on command completion */
+		OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
+
+		/*
+		 * If dword[2:1] are non zero, they specify an address for
+		 * the CP to write the value of dword[3] to on preemption
+		 * complete. Write 0 to skip the write
+		 */
+		OUT_RING(ring, 0x00);
+		OUT_RING(ring, 0x00);
+		/* Data value - not used if the address above is 0 */
+		OUT_RING(ring, 0x01);
+		/* generate interrupt on preemption completion */
+		OUT_RING(ring, 0x00);
+	}
+
+
 	trace_msm_gpu_submit_flush(submit,
 		gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER));
 
 	a6xx_flush(gpu, ring);
+
+	/* Check to see if we need to start preemption */
+	a6xx_preempt_trigger(gpu);
 }
 
 static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
@@ -588,6 +734,89 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
 		  adreno_gpu->ubwc_config.min_acc_len << 23 | hbb_lo << 21);
 }
 
+static void a7xx_patch_pwrup_reglist(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct adreno_reglist_list reglist[2];
+	void *ptr = a6xx_gpu->pwrup_reglist_ptr;
+	struct cpu_gpu_lock *lock = ptr;
+	u32 *dest = (u32 *)&lock->regs[0];
+	int i, j;
+
+	lock->gpu_req = lock->cpu_req = lock->turn = 0;
+	lock->ifpc_list_len = ARRAY_SIZE(a7xx_ifpc_pwrup_reglist);
+	lock->preemption_list_len = ARRAY_SIZE(a7xx_pwrup_reglist);
+
+	/* Static IFPC-only registers */
+	reglist[0].regs = a7xx_ifpc_pwrup_reglist;
+	reglist[0].count = ARRAY_SIZE(a7xx_ifpc_pwrup_reglist);
+	lock->ifpc_list_len = reglist[0].count;
+
+	/* Static IFPC + preemption registers */
+	reglist[1].regs = a7xx_pwrup_reglist;
+	reglist[1].count = ARRAY_SIZE(a7xx_pwrup_reglist);
+	lock->preemption_list_len = reglist[1].count;
+
+	/*
+	 * For each entry in each of the lists, write the offset and the current
+	 * register value into the GPU buffer
+	 */
+	for (i = 0; i < 2; i++) {
+		const u32 *r = reglist[i].regs;
+
+		for (j = 0; j < reglist[i].count; j++) {
+			*dest++ = r[j];
+			*dest++ = gpu_read(gpu, r[j]);
+		}
+	}
+
+	/*
+	 * The overall register list is composed of
+	 * 1. Static IFPC-only registers
+	 * 2. Static IFPC + preemption registers
+	 * 3. Dynamic IFPC + preemption registers (ex: perfcounter selects)
+	 *
+	 * The first two lists are static. Size of these lists are stored as
+	 * number of pairs in ifpc_list_len and preemption_list_len
+	 * respectively. With concurrent binning, Some of the perfcounter
+	 * registers being virtualized, CP needs to know the pipe id to program
+	 * the aperture inorder to restore the same. Thus, third list is a
+	 * dynamic list with triplets as
+	 * (<aperture, shifted 12 bits> <address> <data>), and the length is
+	 * stored as number for triplets in dynamic_list_len.
+	 */
+	lock->dynamic_list_len = 0;
+}
+
+static int a7xx_preempt_start(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring = gpu->rb[0];
+
+	if (gpu->nr_rings <= 1)
+		return 0;
+
+	/* Turn CP protection off */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	a6xx_emit_set_pseudo_reg(ring, a6xx_gpu, NULL);
+
+	/* Yield the floor on command completion */
+	OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x00);
+	/* Generate interrupt on preemption completion */
+	OUT_RING(ring, 0x00);
+
+	a6xx_flush(gpu, ring);
+
+	return a6xx_idle(gpu, ring) ? 0 : -EINVAL;
+}
+
 static int a6xx_cp_init(struct msm_gpu *gpu)
 {
 	struct msm_ringbuffer *ring = gpu->rb[0];
@@ -619,6 +848,8 @@ static int a6xx_cp_init(struct msm_gpu *gpu)
 
 static int a7xx_cp_init(struct msm_gpu *gpu)
 {
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 	struct msm_ringbuffer *ring = gpu->rb[0];
 	u32 mask;
 
@@ -656,11 +887,11 @@ static int a7xx_cp_init(struct msm_gpu *gpu)
 
 	/* *Don't* send a power up reg list for concurrent binning (TODO) */
 	/* Lo address */
-	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, lower_32_bits(a6xx_gpu->pwrup_reglist_iova));
 	/* Hi address */
-	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, upper_32_bits(a6xx_gpu->pwrup_reglist_iova));
 	/* BIT(31) set => read the regs from the list */
-	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, BIT(31));
 
 	a6xx_flush(gpu, ring);
 	return a6xx_idle(gpu, ring) ? 0 : -EINVAL;
@@ -784,6 +1015,16 @@ static int a6xx_ucode_load(struct msm_gpu *gpu)
 		msm_gem_object_set_name(a6xx_gpu->shadow_bo, "shadow");
 	}
 
+	a6xx_gpu->pwrup_reglist_ptr = msm_gem_kernel_new(gpu->dev, PAGE_SIZE,
+							 MSM_BO_WC  | MSM_BO_MAP_PRIV,
+							 gpu->aspace, &a6xx_gpu->pwrup_reglist_bo,
+							 &a6xx_gpu->pwrup_reglist_iova);
+
+	if (IS_ERR(a6xx_gpu->pwrup_reglist_ptr))
+		return PTR_ERR(a6xx_gpu->pwrup_reglist_ptr);
+
+	msm_gem_object_set_name(a6xx_gpu->pwrup_reglist_bo, "pwrup_reglist");
+
 	return 0;
 }
 
@@ -1128,6 +1369,8 @@ static int hw_init(struct msm_gpu *gpu)
 	if (a6xx_gpu->shadow_bo) {
 		gpu_write64(gpu, REG_A6XX_CP_RB_RPTR_ADDR,
 			shadowptr(a6xx_gpu, gpu->rb[0]));
+		for (unsigned int i = 0; i < gpu->nr_rings; i++)
+			a6xx_gpu->shadow[i] = 0;
 	}
 
 	/* ..which means "always" on A7xx, also for BV shadow */
@@ -1136,6 +1379,8 @@ static int hw_init(struct msm_gpu *gpu)
 			    rbmemptr(gpu->rb[0], bv_rptr));
 	}
 
+	a6xx_preempt_hw_init(gpu);
+
 	/* Always come up on rb 0 */
 	a6xx_gpu->cur_ring = gpu->rb[0];
 
@@ -1145,6 +1390,11 @@ static int hw_init(struct msm_gpu *gpu)
 	/* Enable the SQE_to start the CP engine */
 	gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 1);
 
+	if (adreno_is_a7xx(adreno_gpu) && !a6xx_gpu->pwrup_reglist_emitted) {
+		a7xx_patch_pwrup_reglist(gpu);
+		a6xx_gpu->pwrup_reglist_emitted = true;
+	}
+
 	ret = adreno_is_a7xx(adreno_gpu) ? a7xx_cp_init(gpu) : a6xx_cp_init(gpu);
 	if (ret)
 		goto out;
@@ -1182,6 +1432,10 @@ static int hw_init(struct msm_gpu *gpu)
 out:
 	if (adreno_has_gmu_wrapper(adreno_gpu))
 		return ret;
+
+	/* Last step - yield the ringbuffer */
+	a7xx_preempt_start(gpu);
+
 	/*
 	 * Tell the GMU that we are done touching the GPU and it can start power
 	 * management
@@ -1559,8 +1813,13 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
 	if (status & A6XX_RBBM_INT_0_MASK_SWFUSEVIOLATION)
 		a7xx_sw_fuse_violation_irq(gpu);
 
-	if (status & A6XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS)
+	if (status & A6XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS) {
 		msm_gpu_retire(gpu);
+		a6xx_preempt_trigger(gpu);
+	}
+
+	if (status & A6XX_RBBM_INT_0_MASK_CP_SW)
+		a6xx_preempt_irq(gpu);
 
 	return IRQ_HANDLED;
 }
@@ -2333,6 +2592,8 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 				a6xx_fault_handler);
 
 	a6xx_calc_ubwc_config(adreno_gpu);
+	/* Set up the preemption specific bits and pieces for each ringbuffer */
+	a6xx_preempt_init(gpu);
 
 	return gpu;
 }
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index e3e5c53ae8af..7fc994121676 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -12,6 +12,31 @@
 
 extern bool hang_debug;
 
+struct cpu_gpu_lock {
+	uint32_t gpu_req;
+	uint32_t cpu_req;
+	uint32_t turn;
+	union {
+		struct {
+			uint16_t list_length;
+			uint16_t list_offset;
+		};
+		struct {
+			uint8_t ifpc_list_len;
+			uint8_t preemption_list_len;
+			uint16_t dynamic_list_len;
+		};
+	};
+	uint64_t regs[62];
+};
+
+struct adreno_reglist_list {
+	/** @reg: List of register **/
+	const u32 *regs;
+	/** @count: Number of registers in the list **/
+	u32 count;
+};
+
 /**
  * struct a6xx_info - a6xx specific information from device table
  *
@@ -31,6 +56,20 @@ struct a6xx_gpu {
 	uint64_t sqe_iova;
 
 	struct msm_ringbuffer *cur_ring;
+	struct msm_ringbuffer *next_ring;
+
+	struct drm_gem_object *preempt_bo[MSM_GPU_MAX_RINGS];
+	void *preempt[MSM_GPU_MAX_RINGS];
+	uint64_t preempt_iova[MSM_GPU_MAX_RINGS];
+	uint32_t last_seqno[MSM_GPU_MAX_RINGS];
+
+	atomic_t preempt_state;
+	spinlock_t eval_lock;
+	struct timer_list preempt_timer;
+
+	unsigned int preempt_level;
+	bool uses_gmem;
+	bool skip_save_restore;
 
 	struct a6xx_gmu gmu;
 
@@ -38,6 +77,11 @@ struct a6xx_gpu {
 	uint64_t shadow_iova;
 	uint32_t *shadow;
 
+	struct drm_gem_object *pwrup_reglist_bo;
+	void *pwrup_reglist_ptr;
+	uint64_t pwrup_reglist_iova;
+	bool pwrup_reglist_emitted;
+
 	bool has_whereami;
 
 	void __iomem *llc_mmio;
@@ -49,6 +93,102 @@ struct a6xx_gpu {
 
 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
 
+/*
+ * In order to do lockless preemption we use a simple state machine to progress
+ * through the process.
+ *
+ * PREEMPT_NONE - no preemption in progress.  Next state START.
+ * PREEMPT_START - The trigger is evaluating if preemption is possible. Next
+ * states: TRIGGERED, NONE
+ * PREEMPT_FINISH - An intermediate state before moving back to NONE. Next
+ * state: NONE.
+ * PREEMPT_TRIGGERED: A preemption has been executed on the hardware. Next
+ * states: FAULTED, PENDING
+ * PREEMPT_FAULTED: A preemption timed out (never completed). This will trigger
+ * recovery.  Next state: N/A
+ * PREEMPT_PENDING: Preemption complete interrupt fired - the callback is
+ * checking the success of the operation. Next state: FAULTED, NONE.
+ */
+
+enum a6xx_preempt_state {
+	PREEMPT_NONE = 0,
+	PREEMPT_START,
+	PREEMPT_FINISH,
+	PREEMPT_TRIGGERED,
+	PREEMPT_FAULTED,
+	PREEMPT_PENDING,
+};
+
+/*
+ * struct a6xx_preempt_record is a shared buffer between the microcode and the
+ * CPU to store the state for preemption. The record itself is much larger
+ * (2112k) but most of that is used by the CP for storage.
+ *
+ * There is a preemption record assigned per ringbuffer. When the CPU triggers a
+ * preemption, it fills out the record with the useful information (wptr, ring
+ * base, etc) and the microcode uses that information to set up the CP following
+ * the preemption.  When a ring is switched out, the CP will save the ringbuffer
+ * state back to the record. In this way, once the records are properly set up
+ * the CPU can quickly switch back and forth between ringbuffers by only
+ * updating a few registers (often only the wptr).
+ *
+ * These are the CPU aware registers in the record:
+ * @magic: Must always be 0xAE399D6EUL
+ * @info: Type of the record - written 0 by the CPU, updated by the CP
+ * @errno: preemption error record
+ * @data: Data field in YIELD and SET_MARKER packets, Written and used by CP
+ * @cntl: Value of RB_CNTL written by CPU, save/restored by CP
+ * @rptr: Value of RB_RPTR written by CPU, save/restored by CP
+ * @wptr: Value of RB_WPTR written by CPU, save/restored by CP
+ * @_pad: Reserved/padding
+ * @rptr_addr: Value of RB_RPTR_ADDR_LO|HI written by CPU, save/restored by CP
+ * @rbase: Value of RB_BASE written by CPU, save/restored by CP
+ * @counter: GPU address of the storage area for the preemption counters
+ * @bv_rptr_addr: Value of BV_RB_RPTR_ADDR_LO|HI written by CPU, save/restored by CP
+ */
+struct a6xx_preempt_record {
+	u32 magic;
+	u32 info;
+	u32 errno;
+	u32 data;
+	u32 cntl;
+	u32 rptr;
+	u32 wptr;
+	u32 _pad;
+	u64 rptr_addr;
+	u64 rbase;
+	u64 counter;
+	u64 bv_rptr_addr;
+};
+
+#define A6XX_PREEMPT_RECORD_MAGIC 0xAE399D6EUL
+
+#define PREEMPT_RECORD_SIZE_FALLBACK(size) \
+	((size) == 0 ? 4192 * SZ_1K : (size))
+
+#define PREEMPT_OFFSET_SMMU_INFO 0
+#define PREEMPT_OFFSET_PRIV_NON_SECURE (PREEMPT_OFFSET_SMMU_INFO + 4096)
+#define PREEMPT_SIZE(size) \
+	(PREEMPT_OFFSET_PRIV_NON_SECURE + PREEMPT_RECORD_SIZE_FALLBACK(size))
+
+/*
+ * The preemption counter block is a storage area for the value of the
+ * preemption counters that are saved immediately before context switch. We
+ * append it on to the end of the allocation for the preemption record.
+ */
+#define A6XX_PREEMPT_COUNTER_SIZE (16 * 4)
+
+struct a7xx_cp_smmu_info {
+	u32 magic;
+	u32 _pad4;
+	u64 ttbr0;
+	u32 asid;
+	u32 context_idr;
+	u32 context_bank;
+};
+
+#define GEN7_CP_SMMU_INFO_MAGIC 0x241350d5UL
+
 /*
  * Given a register and a count, return a value to program into
  * REG_CP_PROTECT_REG(n) - this will block both reads and writes for
@@ -106,6 +246,34 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
 int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
 void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
 
+void a6xx_preempt_init(struct msm_gpu *gpu);
+void a6xx_preempt_hw_init(struct msm_gpu *gpu);
+void a6xx_preempt_trigger(struct msm_gpu *gpu);
+void a6xx_preempt_irq(struct msm_gpu *gpu);
+void a6xx_preempt_fini(struct msm_gpu *gpu);
+int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu,
+		struct msm_gpu_submitqueue *queue);
+void a6xx_preempt_submitqueue_close(struct msm_gpu *gpu,
+		struct msm_gpu_submitqueue *queue);
+
+/* Return true if we are in a preempt state */
+static inline bool a6xx_in_preempt(struct a6xx_gpu *a6xx_gpu)
+{
+	/*
+	 * Make sure the read to preempt_state is ordered with respect to reads
+	 * of other variables before ...
+	 */
+	smp_rmb();
+
+	int preempt_state = atomic_read(&a6xx_gpu->preempt_state);
+
+	/* ... and after. */
+	smp_rmb();
+
+	return !(preempt_state == PREEMPT_NONE ||
+			preempt_state == PREEMPT_FINISH);
+}
+
 void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
 		       bool suspended);
 unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
new file mode 100644
index 000000000000..aa4bad394f9e
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2023 Collabora, Ltd. */
+/* Copyright (c) 2024 Valve Corporation */
+
+#include "msm_gem.h"
+#include "a6xx_gpu.h"
+#include "a6xx_gmu.xml.h"
+#include "msm_mmu.h"
+
+/*
+ * Try to transition the preemption state from old to new. Return
+ * true on success or false if the original state wasn't 'old'
+ */
+static inline bool try_preempt_state(struct a6xx_gpu *a6xx_gpu,
+		enum a6xx_preempt_state old, enum a6xx_preempt_state new)
+{
+	enum a6xx_preempt_state cur = atomic_cmpxchg(&a6xx_gpu->preempt_state,
+		old, new);
+
+	return (cur == old);
+}
+
+/*
+ * Force the preemption state to the specified state.  This is used in cases
+ * where the current state is known and won't change
+ */
+static inline void set_preempt_state(struct a6xx_gpu *gpu,
+		enum a6xx_preempt_state new)
+{
+	/*
+	 * preempt_state may be read by other cores trying to trigger a
+	 * preemption or in the interrupt handler so barriers are needed
+	 * before...
+	 */
+	smp_mb__before_atomic();
+	atomic_set(&gpu->preempt_state, new);
+	/* ... and after*/
+	smp_mb__after_atomic();
+}
+
+/* Write the most recent wptr for the given ring into the hardware */
+static inline void update_wptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+	unsigned long flags;
+	uint32_t wptr;
+
+	spin_lock_irqsave(&ring->preempt_lock, flags);
+
+	if (ring->restore_wptr) {
+		wptr = get_wptr(ring);
+
+		gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
+
+		ring->restore_wptr = false;
+	}
+
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
+}
+
+/* Return the highest priority ringbuffer with something in it */
+static struct msm_ringbuffer *get_next_ring(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+
+	unsigned long flags;
+	int i;
+
+	for (i = 0; i < gpu->nr_rings; i++) {
+		bool empty;
+		struct msm_ringbuffer *ring = gpu->rb[i];
+
+		spin_lock_irqsave(&ring->preempt_lock, flags);
+		empty = (get_wptr(ring) == gpu->funcs->get_rptr(gpu, ring));
+		if (!empty && ring == a6xx_gpu->cur_ring)
+			empty = ring->memptrs->fence == a6xx_gpu->last_seqno[i];
+		spin_unlock_irqrestore(&ring->preempt_lock, flags);
+
+		if (!empty)
+			return ring;
+	}
+
+	return NULL;
+}
+
+static void a6xx_preempt_timer(struct timer_list *t)
+{
+	struct a6xx_gpu *a6xx_gpu = from_timer(a6xx_gpu, t, preempt_timer);
+	struct msm_gpu *gpu = &a6xx_gpu->base.base;
+	struct drm_device *dev = gpu->dev;
+
+	if (!try_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED, PREEMPT_FAULTED))
+		return;
+
+	dev_err(dev->dev, "%s: preemption timed out\n", gpu->name);
+	kthread_queue_work(gpu->worker, &gpu->recover_work);
+}
+
+void a6xx_preempt_irq(struct msm_gpu *gpu)
+{
+	uint32_t status;
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct drm_device *dev = gpu->dev;
+
+	if (!try_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED, PREEMPT_PENDING))
+		return;
+
+	/* Delete the preemption watchdog timer */
+	del_timer(&a6xx_gpu->preempt_timer);
+
+	/*
+	 * The hardware should be setting the stop bit of CP_CONTEXT_SWITCH_CNTL
+	 * to zero before firing the interrupt, but there is a non zero chance
+	 * of a hardware condition or a software race that could set it again
+	 * before we have a chance to finish. If that happens, log and go for
+	 * recovery
+	 */
+	status = gpu_read(gpu, REG_A6XX_CP_CONTEXT_SWITCH_CNTL);
+	if (unlikely(status & A6XX_CP_CONTEXT_SWITCH_CNTL_STOP)) {
+		DRM_DEV_ERROR(&gpu->pdev->dev,
+					  "!!!!!!!!!!!!!!!! preemption faulted !!!!!!!!!!!!!! irq\n");
+		set_preempt_state(a6xx_gpu, PREEMPT_FAULTED);
+		dev_err(dev->dev, "%s: Preemption failed to complete\n",
+			gpu->name);
+		kthread_queue_work(gpu->worker, &gpu->recover_work);
+		return;
+	}
+
+	a6xx_gpu->cur_ring = a6xx_gpu->next_ring;
+	a6xx_gpu->next_ring = NULL;
+
+	set_preempt_state(a6xx_gpu, PREEMPT_FINISH);
+
+	update_wptr(gpu, a6xx_gpu->cur_ring);
+
+	set_preempt_state(a6xx_gpu, PREEMPT_NONE);
+
+	/*
+	 * Retrigger preemption to avoid a deadlock that might occur when preemption
+	 * is skipped due to it being already in flight when requested.
+	 */
+	a6xx_preempt_trigger(gpu);
+}
+
+void a6xx_preempt_hw_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	int i;
+
+	/* No preemption if we only have one ring */
+	if (gpu->nr_rings == 1)
+		return;
+
+	for (i = 0; i < gpu->nr_rings; i++) {
+		struct a6xx_preempt_record *record_ptr =
+			a6xx_gpu->preempt[i] + PREEMPT_OFFSET_PRIV_NON_SECURE;
+		record_ptr->wptr = 0;
+		record_ptr->rptr = 0;
+		record_ptr->rptr_addr = shadowptr(a6xx_gpu, gpu->rb[i]);
+		record_ptr->info = 0;
+		record_ptr->data = 0;
+		record_ptr->rbase = gpu->rb[i]->iova;
+	}
+
+	/* Write a 0 to signal that we aren't switching pagetables */
+	gpu_write64(gpu, REG_A6XX_CP_CONTEXT_SWITCH_SMMU_INFO, 0);
+
+	/* Enable the GMEM save/restore feature for preemption */
+	gpu_write(gpu, REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE, 0x1);
+
+	/* Reset the preemption state */
+	set_preempt_state(a6xx_gpu, PREEMPT_NONE);
+
+	spin_lock_init(&a6xx_gpu->eval_lock);
+
+	/* Always come up on rb 0 */
+	a6xx_gpu->cur_ring = gpu->rb[0];
+}
+
+void a6xx_preempt_trigger(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	unsigned long flags;
+	struct msm_ringbuffer *ring;
+	unsigned int cntl;
+
+	if (gpu->nr_rings == 1)
+		return;
+
+	/*
+	 * Lock to make sure another thread attempting preemption doesn't skip it
+	 * while we are still evaluating the next ring. This makes sure the other
+	 * thread does start preemption if we abort it and avoids a soft lock.
+	 */
+	spin_lock_irqsave(&a6xx_gpu->eval_lock, flags);
+
+	/*
+	 * Try to start preemption by moving from NONE to START. If
+	 * unsuccessful, a preemption is already in flight
+	 */
+	if (!try_preempt_state(a6xx_gpu, PREEMPT_NONE, PREEMPT_START)) {
+		spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
+		return;
+	}
+
+	cntl = A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL(a6xx_gpu->preempt_level);
+
+	if (a6xx_gpu->skip_save_restore)
+		cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_SKIP_SAVE_RESTORE;
+
+	if (a6xx_gpu->uses_gmem)
+		cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_USES_GMEM;
+
+	cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_STOP;
+
+	/* Get the next ring to preempt to */
+	ring = get_next_ring(gpu);
+
+	/*
+	 * If no ring is populated or the highest priority ring is the current
+	 * one do nothing except to update the wptr to the latest and greatest
+	 */
+	if (!ring || (a6xx_gpu->cur_ring == ring)) {
+		set_preempt_state(a6xx_gpu, PREEMPT_FINISH);
+		update_wptr(gpu, a6xx_gpu->cur_ring);
+		set_preempt_state(a6xx_gpu, PREEMPT_NONE);
+		spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
+		return;
+	}
+
+	spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
+
+	spin_lock_irqsave(&ring->preempt_lock, flags);
+
+	struct a7xx_cp_smmu_info *smmu_info_ptr =
+		a6xx_gpu->preempt[ring->id] + PREEMPT_OFFSET_SMMU_INFO;
+	struct a6xx_preempt_record *record_ptr =
+		a6xx_gpu->preempt[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE;
+	u64 ttbr0 = ring->memptrs->ttbr0;
+	u32 context_idr = ring->memptrs->context_idr;
+
+	smmu_info_ptr->ttbr0 = ttbr0;
+	smmu_info_ptr->context_idr = context_idr;
+	record_ptr->wptr = get_wptr(ring);
+
+	/*
+	 * The GPU will write the wptr we set above when we preempt. Reset
+	 * restore_wptr to make sure that we don't write WPTR to the same
+	 * thing twice. It's still possible subsequent submissions will update
+	 * wptr again, in which case they will set the flag to true. This has
+	 * to be protected by the lock for setting the flag and updating wptr
+	 * to be atomic.
+	 */
+	ring->restore_wptr = false;
+
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
+
+	gpu_write64(gpu,
+		REG_A6XX_CP_CONTEXT_SWITCH_SMMU_INFO,
+		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_SMMU_INFO);
+
+	gpu_write64(gpu,
+		REG_A6XX_CP_CONTEXT_SWITCH_PRIV_NON_SECURE_RESTORE_ADDR,
+		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE);
+
+	a6xx_gpu->next_ring = ring;
+
+	/* Start a timer to catch a stuck preemption */
+	mod_timer(&a6xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(10000));
+
+	/* Set the preemption state to triggered */
+	set_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED);
+
+	/* Trigger the preemption */
+	gpu_write(gpu, REG_A6XX_CP_CONTEXT_SWITCH_CNTL, cntl);
+}
+
+static int preempt_init_ring(struct a6xx_gpu *a6xx_gpu,
+		struct msm_ringbuffer *ring)
+{
+	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+	struct msm_gpu *gpu = &adreno_gpu->base;
+	struct drm_gem_object *bo = NULL;
+	phys_addr_t ttbr;
+	u64 iova = 0;
+	void *ptr;
+	int asid;
+
+	ptr = msm_gem_kernel_new(gpu->dev,
+		PREEMPT_SIZE(adreno_gpu->info->preempt_record_size),
+		MSM_BO_WC | MSM_BO_MAP_PRIV, gpu->aspace, &bo, &iova);
+
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	memset(ptr, 0, PREEMPT_SIZE(adreno_gpu->info->preempt_record_size));
+
+	msm_gem_object_set_name(bo, "preempt_record");
+
+	a6xx_gpu->preempt_bo[ring->id] = bo;
+	a6xx_gpu->preempt_iova[ring->id] = iova;
+	a6xx_gpu->preempt[ring->id] = ptr;
+
+	struct a7xx_cp_smmu_info *smmu_info_ptr = ptr + PREEMPT_OFFSET_SMMU_INFO;
+	struct a6xx_preempt_record *record_ptr = ptr + PREEMPT_OFFSET_PRIV_NON_SECURE;
+
+	msm_iommu_pagetable_params(gpu->aspace->mmu, &ttbr, &asid);
+
+	smmu_info_ptr->magic = GEN7_CP_SMMU_INFO_MAGIC;
+	smmu_info_ptr->ttbr0 = ttbr;
+	smmu_info_ptr->asid = 0xdecafbad;
+	smmu_info_ptr->context_idr = 0;
+
+	/* Set up the defaults on the preemption record */
+	record_ptr->magic = A6XX_PREEMPT_RECORD_MAGIC;
+	record_ptr->info = 0;
+	record_ptr->data = 0;
+	record_ptr->rptr = 0;
+	record_ptr->wptr = 0;
+	record_ptr->cntl = MSM_GPU_RB_CNTL_DEFAULT;
+	record_ptr->rbase = ring->iova;
+	record_ptr->counter = 0;
+	record_ptr->bv_rptr_addr = rbmemptr(ring, bv_rptr);
+
+	return 0;
+}
+
+void a6xx_preempt_fini(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	int i;
+
+	for (i = 0; i < gpu->nr_rings; i++)
+		msm_gem_kernel_put(a6xx_gpu->preempt_bo[i], gpu->aspace);
+}
+
+void a6xx_preempt_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	int i;
+
+	/* No preemption if we only have one ring */
+	if (gpu->nr_rings <= 1)
+		return;
+
+	for (i = 0; i < gpu->nr_rings; i++) {
+		if (preempt_init_ring(a6xx_gpu, gpu->rb[i]))
+			goto fail;
+	}
+
+	/* TODO: make this configurable? */
+	a6xx_gpu->preempt_level = 1;
+	a6xx_gpu->uses_gmem = 1;
+	a6xx_gpu->skip_save_restore = 1;
+
+	timer_setup(&a6xx_gpu->preempt_timer, a6xx_preempt_timer, 0);
+
+	return;
+fail:
+	/*
+	 * On any failure our adventure is over. Clean up and
+	 * set nr_rings to 1 to force preemption off
+	 */
+	a6xx_preempt_fini(gpu);
+	gpu->nr_rings = 1;
+
+	DRM_DEV_ERROR(&gpu->pdev->dev,
+				  "preemption init failed, disabling preemption\n");
+
+	return;
+}
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 174f83137a49..d1e49f701c81 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -36,6 +36,7 @@ struct msm_rbmemptrs {
 
 	volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
 	volatile u64 ttbr0;
+	volatile u32 context_idr;
 };
 
 struct msm_cp_state {
@@ -101,6 +102,12 @@ struct msm_ringbuffer {
 	 */
 	spinlock_t preempt_lock;
 
+	/*
+	 * Whether we skipped writing wptr and it needs to be updated in the
+	 * future when the ring becomes current.
+	 */
+	bool restore_wptr;
+
 	/**
 	 * cur_ctx_seqno:
 	 *

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets
  2024-09-17 11:14 ` [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets Antonino Maniscalco
@ 2024-09-20 16:37   ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 16:37 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Sharat Masetty, Neil Armstrong

On Tue, Sep 17, 2024 at 01:14:15PM +0200, Antonino Maniscalco wrote:
> This patch implements preemption feature for A6xx targets, this allows
> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> hardware as such supports multiple levels of preemption granularities,
> ranging from coarse grained(ringbuffer level) to a more fine grained
> such as draw-call level or a bin boundary level preemption. This patch
> enables the basic preemption level, with more fine grained preemption
> support to follow.
> 
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/Makefile              |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 283 +++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h     | 168 +++++++++++++
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 377 ++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/msm_ringbuffer.h      |   7 +
>  5 files changed, 825 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> index f5e2838c6a76..32e915109a59 100644
> --- a/drivers/gpu/drm/msm/Makefile
> +++ b/drivers/gpu/drm/msm/Makefile
> @@ -23,6 +23,7 @@ adreno-y := \
>  	adreno/a6xx_gpu.o \
>  	adreno/a6xx_gmu.o \
>  	adreno/a6xx_hfi.o \
> +	adreno/a6xx_preempt.o \
>  
>  adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 6e065500b64d..355a3e210335 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -16,6 +16,84 @@
>  
>  #define GPU_PAS_ID 13
>  
> +/* IFPC & Preemption static powerup restore list */
> +static const uint32_t a7xx_pwrup_reglist[] = {
> +	REG_A6XX_UCHE_TRAP_BASE,
> +	REG_A6XX_UCHE_TRAP_BASE + 1,
> +	REG_A6XX_UCHE_WRITE_THRU_BASE,
> +	REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
> +	REG_A6XX_UCHE_GMEM_RANGE_MIN,
> +	REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
> +	REG_A6XX_UCHE_GMEM_RANGE_MAX,
> +	REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
> +	REG_A6XX_UCHE_CACHE_WAYS,
> +	REG_A6XX_UCHE_MODE_CNTL,
> +	REG_A6XX_RB_NC_MODE_CNTL,
> +	REG_A6XX_RB_CMP_DBG_ECO_CNTL,
> +	REG_A7XX_GRAS_NC_MODE_CNTL,
> +	REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
> +	REG_A6XX_UCHE_GBIF_GX_CONFIG,
> +	REG_A6XX_UCHE_CLIENT_PF,
> +	REG_A6XX_TPL1_DBG_ECO_CNTL1,
> +};
> +
> +static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
> +	REG_A6XX_TPL1_NC_MODE_CNTL,
> +	REG_A6XX_SP_NC_MODE_CNTL,
> +	REG_A6XX_CP_DBG_ECO_CNTL,
> +	REG_A6XX_CP_PROTECT_CNTL,
> +	REG_A6XX_CP_PROTECT(0),
> +	REG_A6XX_CP_PROTECT(1),
> +	REG_A6XX_CP_PROTECT(2),
> +	REG_A6XX_CP_PROTECT(3),
> +	REG_A6XX_CP_PROTECT(4),
> +	REG_A6XX_CP_PROTECT(5),
> +	REG_A6XX_CP_PROTECT(6),
> +	REG_A6XX_CP_PROTECT(7),
> +	REG_A6XX_CP_PROTECT(8),
> +	REG_A6XX_CP_PROTECT(9),
> +	REG_A6XX_CP_PROTECT(10),
> +	REG_A6XX_CP_PROTECT(11),
> +	REG_A6XX_CP_PROTECT(12),
> +	REG_A6XX_CP_PROTECT(13),
> +	REG_A6XX_CP_PROTECT(14),
> +	REG_A6XX_CP_PROTECT(15),
> +	REG_A6XX_CP_PROTECT(16),
> +	REG_A6XX_CP_PROTECT(17),
> +	REG_A6XX_CP_PROTECT(18),
> +	REG_A6XX_CP_PROTECT(19),
> +	REG_A6XX_CP_PROTECT(20),
> +	REG_A6XX_CP_PROTECT(21),
> +	REG_A6XX_CP_PROTECT(22),
> +	REG_A6XX_CP_PROTECT(23),
> +	REG_A6XX_CP_PROTECT(24),
> +	REG_A6XX_CP_PROTECT(25),
> +	REG_A6XX_CP_PROTECT(26),
> +	REG_A6XX_CP_PROTECT(27),
> +	REG_A6XX_CP_PROTECT(28),
> +	REG_A6XX_CP_PROTECT(29),
> +	REG_A6XX_CP_PROTECT(30),
> +	REG_A6XX_CP_PROTECT(31),
> +	REG_A6XX_CP_PROTECT(32),
> +	REG_A6XX_CP_PROTECT(33),
> +	REG_A6XX_CP_PROTECT(34),
> +	REG_A6XX_CP_PROTECT(35),
> +	REG_A6XX_CP_PROTECT(36),
> +	REG_A6XX_CP_PROTECT(37),
> +	REG_A6XX_CP_PROTECT(38),
> +	REG_A6XX_CP_PROTECT(39),
> +	REG_A6XX_CP_PROTECT(40),
> +	REG_A6XX_CP_PROTECT(41),
> +	REG_A6XX_CP_PROTECT(42),
> +	REG_A6XX_CP_PROTECT(43),
> +	REG_A6XX_CP_PROTECT(44),
> +	REG_A6XX_CP_PROTECT(45),
> +	REG_A6XX_CP_PROTECT(46),
> +	REG_A6XX_CP_PROTECT(47),
> +	REG_A6XX_CP_AHB_CNTL,
> +};
> +
> +
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -68,6 +146,8 @@ static void update_shadow_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>  
>  static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>  {
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  	uint32_t wptr;
>  	unsigned long flags;
>  
> @@ -81,12 +161,17 @@ static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>  	/* Make sure to wrap wptr if we need to */
>  	wptr = get_wptr(ring);
>  
> -	spin_unlock_irqrestore(&ring->preempt_lock, flags);
> -
> -	/* Make sure everything is posted before making a decision */
> -	mb();
> +	/* Update HW if this is the current ring and we are not in preempt*/
> +	if (!a6xx_in_preempt(a6xx_gpu)) {
> +		if (a6xx_gpu->cur_ring == ring)
> +			gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
> +		else
> +			ring->restore_wptr = true;
> +	} else {
> +		ring->restore_wptr = true;
> +	}
>  
> -	gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
> +	spin_unlock_irqrestore(&ring->preempt_lock, flags);
>  }
>  
>  static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter,
> @@ -138,12 +223,14 @@ static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
>  
>  	/*
>  	 * Write the new TTBR0 to the memstore. This is good for debugging.
> +	 * Needed for preemption
>  	 */
> -	OUT_PKT7(ring, CP_MEM_WRITE, 4);
> +	OUT_PKT7(ring, CP_MEM_WRITE, 5);
>  	OUT_RING(ring, CP_MEM_WRITE_0_ADDR_LO(lower_32_bits(memptr)));
>  	OUT_RING(ring, CP_MEM_WRITE_1_ADDR_HI(upper_32_bits(memptr)));
>  	OUT_RING(ring, lower_32_bits(ttbr));
> -	OUT_RING(ring, (asid << 16) | upper_32_bits(ttbr));
> +	OUT_RING(ring, upper_32_bits(ttbr));
> +	OUT_RING(ring, ctx->seqno);
>  
>  	/*
>  	 * Sync both threads after switching pagetables and enable BR only
> @@ -268,6 +355,34 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  	a6xx_flush(gpu, ring);
>  }
>  
> +static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
> +		struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
> +{
> +	OUT_PKT7(ring, CP_SET_PSEUDO_REG, 12);
> +
> +	OUT_RING(ring, SMMU_INFO);
> +	/* don't save SMMU, we write the record from the kernel instead */
> +	OUT_RING(ring, 0);
> +	OUT_RING(ring, 0);
> +
> +	/* privileged and non secure buffer save */
> +	OUT_RING(ring, NON_SECURE_SAVE_ADDR);
> +	OUT_RING(ring, lower_32_bits(
> +		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE));
> +	OUT_RING(ring, upper_32_bits(
> +		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE));
> +
> +	/* user context buffer save, seems to be unnused by fw */
> +	OUT_RING(ring, NON_PRIV_SAVE_ADDR);
> +	OUT_RING(ring, 0);
> +	OUT_RING(ring, 0);
> +
> +	OUT_RING(ring, COUNTER);
> +	/* seems OK to set to 0 to disable it */
> +	OUT_RING(ring, 0);
> +	OUT_RING(ring, 0);
> +}
> +
>  static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  {
>  	unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
> @@ -285,6 +400,13 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  
>  	a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
>  
> +	/*
> +	 * If preemption is enabled, then set the pseudo register for the save
> +	 * sequence
> +	 */
> +	if (gpu->nr_rings > 1)
> +		a6xx_emit_set_pseudo_reg(ring, a6xx_gpu, submit->queue);
> +
>  	get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
>  		rbmemptr_stats(ring, index, cpcycles_start));
>  	get_stats_counter(ring, REG_A6XX_CP_ALWAYS_ON_COUNTER,
> @@ -376,6 +498,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  	OUT_RING(ring, upper_32_bits(rbmemptr(ring, bv_fence)));
>  	OUT_RING(ring, submit->seqno);
>  
> +	a6xx_gpu->last_seqno[ring->id] = submit->seqno;
> +
>  	/* write the ringbuffer timestamp */
>  	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
>  	OUT_RING(ring, CACHE_CLEAN | CP_EVENT_WRITE_0_IRQ | BIT(27));
> @@ -389,10 +513,32 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  	OUT_PKT7(ring, CP_SET_MARKER, 1);
>  	OUT_RING(ring, 0x100); /* IFPC enable */
>  
> +	/* If preemption is enabled */
> +	if (gpu->nr_rings > 1) {
> +		/* Yield the floor on command completion */
> +		OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
> +
> +		/*
> +		 * If dword[2:1] are non zero, they specify an address for
> +		 * the CP to write the value of dword[3] to on preemption
> +		 * complete. Write 0 to skip the write
> +		 */
> +		OUT_RING(ring, 0x00);
> +		OUT_RING(ring, 0x00);
> +		/* Data value - not used if the address above is 0 */
> +		OUT_RING(ring, 0x01);
> +		/* generate interrupt on preemption completion */
> +		OUT_RING(ring, 0x00);
> +	}
> +
> +
>  	trace_msm_gpu_submit_flush(submit,
>  		gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER));
>  
>  	a6xx_flush(gpu, ring);
> +
> +	/* Check to see if we need to start preemption */
> +	a6xx_preempt_trigger(gpu);
>  }
>  
>  static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> @@ -588,6 +734,89 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
>  		  adreno_gpu->ubwc_config.min_acc_len << 23 | hbb_lo << 21);
>  }
>  
> +static void a7xx_patch_pwrup_reglist(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct adreno_reglist_list reglist[2];
> +	void *ptr = a6xx_gpu->pwrup_reglist_ptr;
> +	struct cpu_gpu_lock *lock = ptr;
> +	u32 *dest = (u32 *)&lock->regs[0];
> +	int i, j;
> +
> +	lock->gpu_req = lock->cpu_req = lock->turn = 0;
> +	lock->ifpc_list_len = ARRAY_SIZE(a7xx_ifpc_pwrup_reglist);
> +	lock->preemption_list_len = ARRAY_SIZE(a7xx_pwrup_reglist);
> +
> +	/* Static IFPC-only registers */
> +	reglist[0].regs = a7xx_ifpc_pwrup_reglist;
> +	reglist[0].count = ARRAY_SIZE(a7xx_ifpc_pwrup_reglist);
> +	lock->ifpc_list_len = reglist[0].count;
> +
> +	/* Static IFPC + preemption registers */
> +	reglist[1].regs = a7xx_pwrup_reglist;
> +	reglist[1].count = ARRAY_SIZE(a7xx_pwrup_reglist);
> +	lock->preemption_list_len = reglist[1].count;
> +
> +	/*
> +	 * For each entry in each of the lists, write the offset and the current
> +	 * register value into the GPU buffer
> +	 */
> +	for (i = 0; i < 2; i++) {
> +		const u32 *r = reglist[i].regs;
> +
> +		for (j = 0; j < reglist[i].count; j++) {
> +			*dest++ = r[j];
> +			*dest++ = gpu_read(gpu, r[j]);
> +		}
> +	}
> +
> +	/*
> +	 * The overall register list is composed of
> +	 * 1. Static IFPC-only registers
> +	 * 2. Static IFPC + preemption registers
> +	 * 3. Dynamic IFPC + preemption registers (ex: perfcounter selects)
> +	 *
> +	 * The first two lists are static. Size of these lists are stored as
> +	 * number of pairs in ifpc_list_len and preemption_list_len
> +	 * respectively. With concurrent binning, Some of the perfcounter
> +	 * registers being virtualized, CP needs to know the pipe id to program
> +	 * the aperture inorder to restore the same. Thus, third list is a
> +	 * dynamic list with triplets as
> +	 * (<aperture, shifted 12 bits> <address> <data>), and the length is
> +	 * stored as number for triplets in dynamic_list_len.
> +	 */
> +	lock->dynamic_list_len = 0;
> +}
> +
> +static int a7xx_preempt_start(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct msm_ringbuffer *ring = gpu->rb[0];
> +
> +	if (gpu->nr_rings <= 1)
> +		return 0;
> +
> +	/* Turn CP protection off */
> +	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> +	OUT_RING(ring, 0);
> +
> +	a6xx_emit_set_pseudo_reg(ring, a6xx_gpu, NULL);
> +
> +	/* Yield the floor on command completion */
> +	OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
> +	OUT_RING(ring, 0x00);
> +	OUT_RING(ring, 0x00);
> +	OUT_RING(ring, 0x00);
> +	/* Generate interrupt on preemption completion */
> +	OUT_RING(ring, 0x00);
> +
> +	a6xx_flush(gpu, ring);
> +
> +	return a6xx_idle(gpu, ring) ? 0 : -EINVAL;
> +}
> +
>  static int a6xx_cp_init(struct msm_gpu *gpu)
>  {
>  	struct msm_ringbuffer *ring = gpu->rb[0];
> @@ -619,6 +848,8 @@ static int a6xx_cp_init(struct msm_gpu *gpu)
>  
>  static int a7xx_cp_init(struct msm_gpu *gpu)
>  {
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  	struct msm_ringbuffer *ring = gpu->rb[0];
>  	u32 mask;
>  
> @@ -656,11 +887,11 @@ static int a7xx_cp_init(struct msm_gpu *gpu)
>  
>  	/* *Don't* send a power up reg list for concurrent binning (TODO) */
>  	/* Lo address */
> -	OUT_RING(ring, 0x00000000);
> +	OUT_RING(ring, lower_32_bits(a6xx_gpu->pwrup_reglist_iova));
>  	/* Hi address */
> -	OUT_RING(ring, 0x00000000);
> +	OUT_RING(ring, upper_32_bits(a6xx_gpu->pwrup_reglist_iova));
>  	/* BIT(31) set => read the regs from the list */
> -	OUT_RING(ring, 0x00000000);
> +	OUT_RING(ring, BIT(31));
>  
>  	a6xx_flush(gpu, ring);
>  	return a6xx_idle(gpu, ring) ? 0 : -EINVAL;
> @@ -784,6 +1015,16 @@ static int a6xx_ucode_load(struct msm_gpu *gpu)
>  		msm_gem_object_set_name(a6xx_gpu->shadow_bo, "shadow");
>  	}
>  
> +	a6xx_gpu->pwrup_reglist_ptr = msm_gem_kernel_new(gpu->dev, PAGE_SIZE,
> +							 MSM_BO_WC  | MSM_BO_MAP_PRIV,
> +							 gpu->aspace, &a6xx_gpu->pwrup_reglist_bo,
> +							 &a6xx_gpu->pwrup_reglist_iova);
> +
> +	if (IS_ERR(a6xx_gpu->pwrup_reglist_ptr))
> +		return PTR_ERR(a6xx_gpu->pwrup_reglist_ptr);
> +
> +	msm_gem_object_set_name(a6xx_gpu->pwrup_reglist_bo, "pwrup_reglist");
> +
>  	return 0;
>  }
>  
> @@ -1128,6 +1369,8 @@ static int hw_init(struct msm_gpu *gpu)
>  	if (a6xx_gpu->shadow_bo) {
>  		gpu_write64(gpu, REG_A6XX_CP_RB_RPTR_ADDR,
>  			shadowptr(a6xx_gpu, gpu->rb[0]));
> +		for (unsigned int i = 0; i < gpu->nr_rings; i++)
> +			a6xx_gpu->shadow[i] = 0;
>  	}
>  
>  	/* ..which means "always" on A7xx, also for BV shadow */
> @@ -1136,6 +1379,8 @@ static int hw_init(struct msm_gpu *gpu)
>  			    rbmemptr(gpu->rb[0], bv_rptr));
>  	}
>  
> +	a6xx_preempt_hw_init(gpu);
> +
>  	/* Always come up on rb 0 */
>  	a6xx_gpu->cur_ring = gpu->rb[0];
>  
> @@ -1145,6 +1390,11 @@ static int hw_init(struct msm_gpu *gpu)
>  	/* Enable the SQE_to start the CP engine */
>  	gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 1);
>  
> +	if (adreno_is_a7xx(adreno_gpu) && !a6xx_gpu->pwrup_reglist_emitted) {
> +		a7xx_patch_pwrup_reglist(gpu);
> +		a6xx_gpu->pwrup_reglist_emitted = true;
> +	}
> +
>  	ret = adreno_is_a7xx(adreno_gpu) ? a7xx_cp_init(gpu) : a6xx_cp_init(gpu);
>  	if (ret)
>  		goto out;
> @@ -1182,6 +1432,10 @@ static int hw_init(struct msm_gpu *gpu)
>  out:
>  	if (adreno_has_gmu_wrapper(adreno_gpu))
>  		return ret;
> +
> +	/* Last step - yield the ringbuffer */
> +	a7xx_preempt_start(gpu);
> +
>  	/*
>  	 * Tell the GMU that we are done touching the GPU and it can start power
>  	 * management
> @@ -1559,8 +1813,13 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
>  	if (status & A6XX_RBBM_INT_0_MASK_SWFUSEVIOLATION)
>  		a7xx_sw_fuse_violation_irq(gpu);
>  
> -	if (status & A6XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS)
> +	if (status & A6XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS) {
>  		msm_gpu_retire(gpu);
> +		a6xx_preempt_trigger(gpu);
> +	}
> +
> +	if (status & A6XX_RBBM_INT_0_MASK_CP_SW)
> +		a6xx_preempt_irq(gpu);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -2333,6 +2592,8 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  				a6xx_fault_handler);
>  
>  	a6xx_calc_ubwc_config(adreno_gpu);
> +	/* Set up the preemption specific bits and pieces for each ringbuffer */
> +	a6xx_preempt_init(gpu);
>  
>  	return gpu;
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index e3e5c53ae8af..7fc994121676 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -12,6 +12,31 @@
>  
>  extern bool hang_debug;
>  
> +struct cpu_gpu_lock {
> +	uint32_t gpu_req;
> +	uint32_t cpu_req;
> +	uint32_t turn;
> +	union {
> +		struct {
> +			uint16_t list_length;
> +			uint16_t list_offset;
> +		};
> +		struct {
> +			uint8_t ifpc_list_len;
> +			uint8_t preemption_list_len;
> +			uint16_t dynamic_list_len;
> +		};
> +	};
> +	uint64_t regs[62];
> +};
> +
> +struct adreno_reglist_list {
> +	/** @reg: List of register **/
> +	const u32 *regs;
> +	/** @count: Number of registers in the list **/
> +	u32 count;
> +};
> +
>  /**
>   * struct a6xx_info - a6xx specific information from device table
>   *
> @@ -31,6 +56,20 @@ struct a6xx_gpu {
>  	uint64_t sqe_iova;
>  
>  	struct msm_ringbuffer *cur_ring;
> +	struct msm_ringbuffer *next_ring;
> +
> +	struct drm_gem_object *preempt_bo[MSM_GPU_MAX_RINGS];
> +	void *preempt[MSM_GPU_MAX_RINGS];
> +	uint64_t preempt_iova[MSM_GPU_MAX_RINGS];
> +	uint32_t last_seqno[MSM_GPU_MAX_RINGS];
> +
> +	atomic_t preempt_state;
> +	spinlock_t eval_lock;
> +	struct timer_list preempt_timer;
> +
> +	unsigned int preempt_level;
> +	bool uses_gmem;
> +	bool skip_save_restore;
>  
>  	struct a6xx_gmu gmu;
>  
> @@ -38,6 +77,11 @@ struct a6xx_gpu {
>  	uint64_t shadow_iova;
>  	uint32_t *shadow;
>  
> +	struct drm_gem_object *pwrup_reglist_bo;
> +	void *pwrup_reglist_ptr;
> +	uint64_t pwrup_reglist_iova;
> +	bool pwrup_reglist_emitted;
> +
>  	bool has_whereami;
>  
>  	void __iomem *llc_mmio;
> @@ -49,6 +93,102 @@ struct a6xx_gpu {
>  
>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
>  
> +/*
> + * In order to do lockless preemption we use a simple state machine to progress
> + * through the process.
> + *
> + * PREEMPT_NONE - no preemption in progress.  Next state START.
> + * PREEMPT_START - The trigger is evaluating if preemption is possible. Next
> + * states: TRIGGERED, NONE
> + * PREEMPT_FINISH - An intermediate state before moving back to NONE. Next
> + * state: NONE.
> + * PREEMPT_TRIGGERED: A preemption has been executed on the hardware. Next
> + * states: FAULTED, PENDING
> + * PREEMPT_FAULTED: A preemption timed out (never completed). This will trigger
> + * recovery.  Next state: N/A
> + * PREEMPT_PENDING: Preemption complete interrupt fired - the callback is
> + * checking the success of the operation. Next state: FAULTED, NONE.
> + */
> +
> +enum a6xx_preempt_state {
> +	PREEMPT_NONE = 0,
> +	PREEMPT_START,
> +	PREEMPT_FINISH,
> +	PREEMPT_TRIGGERED,
> +	PREEMPT_FAULTED,
> +	PREEMPT_PENDING,
> +};
> +
> +/*
> + * struct a6xx_preempt_record is a shared buffer between the microcode and the
> + * CPU to store the state for preemption. The record itself is much larger
> + * (2112k) but most of that is used by the CP for storage.
> + *
> + * There is a preemption record assigned per ringbuffer. When the CPU triggers a
> + * preemption, it fills out the record with the useful information (wptr, ring
> + * base, etc) and the microcode uses that information to set up the CP following
> + * the preemption.  When a ring is switched out, the CP will save the ringbuffer
> + * state back to the record. In this way, once the records are properly set up
> + * the CPU can quickly switch back and forth between ringbuffers by only
> + * updating a few registers (often only the wptr).
> + *
> + * These are the CPU aware registers in the record:
> + * @magic: Must always be 0xAE399D6EUL
> + * @info: Type of the record - written 0 by the CPU, updated by the CP
> + * @errno: preemption error record
> + * @data: Data field in YIELD and SET_MARKER packets, Written and used by CP
> + * @cntl: Value of RB_CNTL written by CPU, save/restored by CP
> + * @rptr: Value of RB_RPTR written by CPU, save/restored by CP
> + * @wptr: Value of RB_WPTR written by CPU, save/restored by CP
> + * @_pad: Reserved/padding
> + * @rptr_addr: Value of RB_RPTR_ADDR_LO|HI written by CPU, save/restored by CP
> + * @rbase: Value of RB_BASE written by CPU, save/restored by CP
> + * @counter: GPU address of the storage area for the preemption counters
> + * @bv_rptr_addr: Value of BV_RB_RPTR_ADDR_LO|HI written by CPU, save/restored by CP
> + */
> +struct a6xx_preempt_record {
> +	u32 magic;
> +	u32 info;
> +	u32 errno;
> +	u32 data;
> +	u32 cntl;
> +	u32 rptr;
> +	u32 wptr;
> +	u32 _pad;
> +	u64 rptr_addr;
> +	u64 rbase;
> +	u64 counter;
> +	u64 bv_rptr_addr;
> +};
> +
> +#define A6XX_PREEMPT_RECORD_MAGIC 0xAE399D6EUL
> +
> +#define PREEMPT_RECORD_SIZE_FALLBACK(size) \
> +	((size) == 0 ? 4192 * SZ_1K : (size))
> +
> +#define PREEMPT_OFFSET_SMMU_INFO 0
> +#define PREEMPT_OFFSET_PRIV_NON_SECURE (PREEMPT_OFFSET_SMMU_INFO + 4096)
> +#define PREEMPT_SIZE(size) \
> +	(PREEMPT_OFFSET_PRIV_NON_SECURE + PREEMPT_RECORD_SIZE_FALLBACK(size))
> +
> +/*
> + * The preemption counter block is a storage area for the value of the
> + * preemption counters that are saved immediately before context switch. We
> + * append it on to the end of the allocation for the preemption record.
> + */
> +#define A6XX_PREEMPT_COUNTER_SIZE (16 * 4)
> +
> +struct a7xx_cp_smmu_info {
> +	u32 magic;
> +	u32 _pad4;
> +	u64 ttbr0;
> +	u32 asid;
> +	u32 context_idr;
> +	u32 context_bank;
> +};
> +
> +#define GEN7_CP_SMMU_INFO_MAGIC 0x241350d5UL
> +
>  /*
>   * Given a register and a count, return a value to program into
>   * REG_CP_PROTECT_REG(n) - this will block both reads and writes for
> @@ -106,6 +246,34 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>  int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>  
> +void a6xx_preempt_init(struct msm_gpu *gpu);
> +void a6xx_preempt_hw_init(struct msm_gpu *gpu);
> +void a6xx_preempt_trigger(struct msm_gpu *gpu);
> +void a6xx_preempt_irq(struct msm_gpu *gpu);
> +void a6xx_preempt_fini(struct msm_gpu *gpu);
> +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu,
> +		struct msm_gpu_submitqueue *queue);
> +void a6xx_preempt_submitqueue_close(struct msm_gpu *gpu,
> +		struct msm_gpu_submitqueue *queue);
> +
> +/* Return true if we are in a preempt state */
> +static inline bool a6xx_in_preempt(struct a6xx_gpu *a6xx_gpu)
> +{
> +	/*
> +	 * Make sure the read to preempt_state is ordered with respect to reads
> +	 * of other variables before ...
> +	 */
> +	smp_rmb();
> +
> +	int preempt_state = atomic_read(&a6xx_gpu->preempt_state);
> +
> +	/* ... and after. */
> +	smp_rmb();
> +
> +	return !(preempt_state == PREEMPT_NONE ||
> +			preempt_state == PREEMPT_FINISH);
> +}
> +
>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>  		       bool suspended);
>  unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> new file mode 100644
> index 000000000000..aa4bad394f9e
> --- /dev/null
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -0,0 +1,377 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018, The Linux Foundation. All rights reserved. */
> +/* Copyright (c) 2023 Collabora, Ltd. */
> +/* Copyright (c) 2024 Valve Corporation */
> +
> +#include "msm_gem.h"
> +#include "a6xx_gpu.h"
> +#include "a6xx_gmu.xml.h"
> +#include "msm_mmu.h"
> +
> +/*
> + * Try to transition the preemption state from old to new. Return
> + * true on success or false if the original state wasn't 'old'
> + */
> +static inline bool try_preempt_state(struct a6xx_gpu *a6xx_gpu,
> +		enum a6xx_preempt_state old, enum a6xx_preempt_state new)
> +{
> +	enum a6xx_preempt_state cur = atomic_cmpxchg(&a6xx_gpu->preempt_state,
> +		old, new);
> +
> +	return (cur == old);
> +}
> +
> +/*
> + * Force the preemption state to the specified state.  This is used in cases
> + * where the current state is known and won't change
> + */
> +static inline void set_preempt_state(struct a6xx_gpu *gpu,
> +		enum a6xx_preempt_state new)
> +{
> +	/*
> +	 * preempt_state may be read by other cores trying to trigger a
> +	 * preemption or in the interrupt handler so barriers are needed
> +	 * before...
> +	 */
> +	smp_mb__before_atomic();
> +	atomic_set(&gpu->preempt_state, new);
> +	/* ... and after*/
> +	smp_mb__after_atomic();
> +}
> +
> +/* Write the most recent wptr for the given ring into the hardware */
> +static inline void update_wptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
> +{
> +	unsigned long flags;
> +	uint32_t wptr;
> +
> +	spin_lock_irqsave(&ring->preempt_lock, flags);
> +
> +	if (ring->restore_wptr) {
> +		wptr = get_wptr(ring);
> +
> +		gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
> +
> +		ring->restore_wptr = false;
> +	}
> +
> +	spin_unlock_irqrestore(&ring->preempt_lock, flags);
> +}
> +
> +/* Return the highest priority ringbuffer with something in it */
> +static struct msm_ringbuffer *get_next_ring(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +
> +	unsigned long flags;
> +	int i;
> +
> +	for (i = 0; i < gpu->nr_rings; i++) {
> +		bool empty;
> +		struct msm_ringbuffer *ring = gpu->rb[i];
> +
> +		spin_lock_irqsave(&ring->preempt_lock, flags);
> +		empty = (get_wptr(ring) == gpu->funcs->get_rptr(gpu, ring));
> +		if (!empty && ring == a6xx_gpu->cur_ring)
> +			empty = ring->memptrs->fence == a6xx_gpu->last_seqno[i];
> +		spin_unlock_irqrestore(&ring->preempt_lock, flags);
> +
> +		if (!empty)
> +			return ring;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void a6xx_preempt_timer(struct timer_list *t)
> +{
> +	struct a6xx_gpu *a6xx_gpu = from_timer(a6xx_gpu, t, preempt_timer);
> +	struct msm_gpu *gpu = &a6xx_gpu->base.base;
> +	struct drm_device *dev = gpu->dev;
> +
> +	if (!try_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED, PREEMPT_FAULTED))
> +		return;
> +
> +	dev_err(dev->dev, "%s: preemption timed out\n", gpu->name);
> +	kthread_queue_work(gpu->worker, &gpu->recover_work);
> +}
> +
> +void a6xx_preempt_irq(struct msm_gpu *gpu)
> +{
> +	uint32_t status;
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct drm_device *dev = gpu->dev;
> +
> +	if (!try_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED, PREEMPT_PENDING))
> +		return;
> +
> +	/* Delete the preemption watchdog timer */
> +	del_timer(&a6xx_gpu->preempt_timer);
> +
> +	/*
> +	 * The hardware should be setting the stop bit of CP_CONTEXT_SWITCH_CNTL
> +	 * to zero before firing the interrupt, but there is a non zero chance
> +	 * of a hardware condition or a software race that could set it again
> +	 * before we have a chance to finish. If that happens, log and go for
> +	 * recovery
> +	 */
> +	status = gpu_read(gpu, REG_A6XX_CP_CONTEXT_SWITCH_CNTL);
> +	if (unlikely(status & A6XX_CP_CONTEXT_SWITCH_CNTL_STOP)) {
> +		DRM_DEV_ERROR(&gpu->pdev->dev,
> +					  "!!!!!!!!!!!!!!!! preemption faulted !!!!!!!!!!!!!! irq\n");
> +		set_preempt_state(a6xx_gpu, PREEMPT_FAULTED);
> +		dev_err(dev->dev, "%s: Preemption failed to complete\n",
> +			gpu->name);
> +		kthread_queue_work(gpu->worker, &gpu->recover_work);
> +		return;
> +	}
> +
> +	a6xx_gpu->cur_ring = a6xx_gpu->next_ring;
> +	a6xx_gpu->next_ring = NULL;
> +
> +	set_preempt_state(a6xx_gpu, PREEMPT_FINISH);
> +
> +	update_wptr(gpu, a6xx_gpu->cur_ring);
> +
> +	set_preempt_state(a6xx_gpu, PREEMPT_NONE);
> +
> +	/*
> +	 * Retrigger preemption to avoid a deadlock that might occur when preemption
> +	 * is skipped due to it being already in flight when requested.
> +	 */
> +	a6xx_preempt_trigger(gpu);
> +}
> +
> +void a6xx_preempt_hw_init(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	int i;
> +
> +	/* No preemption if we only have one ring */
> +	if (gpu->nr_rings == 1)
> +		return;
> +
> +	for (i = 0; i < gpu->nr_rings; i++) {
> +		struct a6xx_preempt_record *record_ptr =
> +			a6xx_gpu->preempt[i] + PREEMPT_OFFSET_PRIV_NON_SECURE;
> +		record_ptr->wptr = 0;
> +		record_ptr->rptr = 0;
> +		record_ptr->rptr_addr = shadowptr(a6xx_gpu, gpu->rb[i]);
> +		record_ptr->info = 0;
> +		record_ptr->data = 0;
> +		record_ptr->rbase = gpu->rb[i]->iova;
> +	}
> +
> +	/* Write a 0 to signal that we aren't switching pagetables */
> +	gpu_write64(gpu, REG_A6XX_CP_CONTEXT_SWITCH_SMMU_INFO, 0);
> +
> +	/* Enable the GMEM save/restore feature for preemption */
> +	gpu_write(gpu, REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE, 0x1);
> +
> +	/* Reset the preemption state */
> +	set_preempt_state(a6xx_gpu, PREEMPT_NONE);
> +
> +	spin_lock_init(&a6xx_gpu->eval_lock);
> +
> +	/* Always come up on rb 0 */
> +	a6xx_gpu->cur_ring = gpu->rb[0];
> +}
> +
> +void a6xx_preempt_trigger(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	unsigned long flags;
> +	struct msm_ringbuffer *ring;
> +	unsigned int cntl;
> +
> +	if (gpu->nr_rings == 1)
> +		return;
> +
> +	/*
> +	 * Lock to make sure another thread attempting preemption doesn't skip it
> +	 * while we are still evaluating the next ring. This makes sure the other
> +	 * thread does start preemption if we abort it and avoids a soft lock.
> +	 */
> +	spin_lock_irqsave(&a6xx_gpu->eval_lock, flags);
> +
> +	/*
> +	 * Try to start preemption by moving from NONE to START. If
> +	 * unsuccessful, a preemption is already in flight
> +	 */
> +	if (!try_preempt_state(a6xx_gpu, PREEMPT_NONE, PREEMPT_START)) {
> +		spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> +		return;
> +	}
> +
> +	cntl = A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL(a6xx_gpu->preempt_level);
> +
> +	if (a6xx_gpu->skip_save_restore)
> +		cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_SKIP_SAVE_RESTORE;
> +
> +	if (a6xx_gpu->uses_gmem)
> +		cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_USES_GMEM;
> +
> +	cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_STOP;
> +
> +	/* Get the next ring to preempt to */
> +	ring = get_next_ring(gpu);
> +
> +	/*
> +	 * If no ring is populated or the highest priority ring is the current
> +	 * one do nothing except to update the wptr to the latest and greatest
> +	 */
> +	if (!ring || (a6xx_gpu->cur_ring == ring)) {
> +		set_preempt_state(a6xx_gpu, PREEMPT_FINISH);
> +		update_wptr(gpu, a6xx_gpu->cur_ring);
> +		set_preempt_state(a6xx_gpu, PREEMPT_NONE);
> +		spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> +		return;
> +	}
> +
> +	spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> +
> +	spin_lock_irqsave(&ring->preempt_lock, flags);
> +
> +	struct a7xx_cp_smmu_info *smmu_info_ptr =
> +		a6xx_gpu->preempt[ring->id] + PREEMPT_OFFSET_SMMU_INFO;
> +	struct a6xx_preempt_record *record_ptr =
> +		a6xx_gpu->preempt[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE;
> +	u64 ttbr0 = ring->memptrs->ttbr0;
> +	u32 context_idr = ring->memptrs->context_idr;
> +
> +	smmu_info_ptr->ttbr0 = ttbr0;
> +	smmu_info_ptr->context_idr = context_idr;
> +	record_ptr->wptr = get_wptr(ring);
> +
> +	/*
> +	 * The GPU will write the wptr we set above when we preempt. Reset
> +	 * restore_wptr to make sure that we don't write WPTR to the same
> +	 * thing twice. It's still possible subsequent submissions will update
> +	 * wptr again, in which case they will set the flag to true. This has
> +	 * to be protected by the lock for setting the flag and updating wptr
> +	 * to be atomic.
> +	 */
> +	ring->restore_wptr = false;
> +
> +	spin_unlock_irqrestore(&ring->preempt_lock, flags);
> +
> +	gpu_write64(gpu,
> +		REG_A6XX_CP_CONTEXT_SWITCH_SMMU_INFO,
> +		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_SMMU_INFO);
> +
> +	gpu_write64(gpu,
> +		REG_A6XX_CP_CONTEXT_SWITCH_PRIV_NON_SECURE_RESTORE_ADDR,
> +		a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE);
> +
> +	a6xx_gpu->next_ring = ring;
> +
> +	/* Start a timer to catch a stuck preemption */
> +	mod_timer(&a6xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(10000));
> +
> +	/* Set the preemption state to triggered */
> +	set_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED);
> +
> +	/* Trigger the preemption */
> +	gpu_write(gpu, REG_A6XX_CP_CONTEXT_SWITCH_CNTL, cntl);
> +}
> +
> +static int preempt_init_ring(struct a6xx_gpu *a6xx_gpu,
> +		struct msm_ringbuffer *ring)
> +{
> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> +	struct msm_gpu *gpu = &adreno_gpu->base;
> +	struct drm_gem_object *bo = NULL;
> +	phys_addr_t ttbr;
> +	u64 iova = 0;
> +	void *ptr;
> +	int asid;
> +
> +	ptr = msm_gem_kernel_new(gpu->dev,
> +		PREEMPT_SIZE(adreno_gpu->info->preempt_record_size),
> +		MSM_BO_WC | MSM_BO_MAP_PRIV, gpu->aspace, &bo, &iova);
> +
> +	if (IS_ERR(ptr))
> +		return PTR_ERR(ptr);
> +
> +	memset(ptr, 0, PREEMPT_SIZE(adreno_gpu->info->preempt_record_size));
> +
> +	msm_gem_object_set_name(bo, "preempt_record");

I wish we could add ring id too. Anyway

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

-Akhil

> +
> +	a6xx_gpu->preempt_bo[ring->id] = bo;
> +	a6xx_gpu->preempt_iova[ring->id] = iova;
> +	a6xx_gpu->preempt[ring->id] = ptr;
> +
> +	struct a7xx_cp_smmu_info *smmu_info_ptr = ptr + PREEMPT_OFFSET_SMMU_INFO;
> +	struct a6xx_preempt_record *record_ptr = ptr + PREEMPT_OFFSET_PRIV_NON_SECURE;
> +
> +	msm_iommu_pagetable_params(gpu->aspace->mmu, &ttbr, &asid);
> +
> +	smmu_info_ptr->magic = GEN7_CP_SMMU_INFO_MAGIC;
> +	smmu_info_ptr->ttbr0 = ttbr;
> +	smmu_info_ptr->asid = 0xdecafbad;
> +	smmu_info_ptr->context_idr = 0;
> +
> +	/* Set up the defaults on the preemption record */
> +	record_ptr->magic = A6XX_PREEMPT_RECORD_MAGIC;
> +	record_ptr->info = 0;
> +	record_ptr->data = 0;
> +	record_ptr->rptr = 0;
> +	record_ptr->wptr = 0;
> +	record_ptr->cntl = MSM_GPU_RB_CNTL_DEFAULT;
> +	record_ptr->rbase = ring->iova;
> +	record_ptr->counter = 0;
> +	record_ptr->bv_rptr_addr = rbmemptr(ring, bv_rptr);
> +
> +	return 0;
> +}
> +
> +void a6xx_preempt_fini(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	int i;
> +
> +	for (i = 0; i < gpu->nr_rings; i++)
> +		msm_gem_kernel_put(a6xx_gpu->preempt_bo[i], gpu->aspace);
> +}
> +
> +void a6xx_preempt_init(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	int i;
> +
> +	/* No preemption if we only have one ring */
> +	if (gpu->nr_rings <= 1)
> +		return;
> +
> +	for (i = 0; i < gpu->nr_rings; i++) {
> +		if (preempt_init_ring(a6xx_gpu, gpu->rb[i]))
> +			goto fail;
> +	}
> +
> +	/* TODO: make this configurable? */
> +	a6xx_gpu->preempt_level = 1;
> +	a6xx_gpu->uses_gmem = 1;
> +	a6xx_gpu->skip_save_restore = 1;
> +
> +	timer_setup(&a6xx_gpu->preempt_timer, a6xx_preempt_timer, 0);
> +
> +	return;
> +fail:
> +	/*
> +	 * On any failure our adventure is over. Clean up and
> +	 * set nr_rings to 1 to force preemption off
> +	 */
> +	a6xx_preempt_fini(gpu);
> +	gpu->nr_rings = 1;
> +
> +	DRM_DEV_ERROR(&gpu->pdev->dev,
> +				  "preemption init failed, disabling preemption\n");
> +
> +	return;
> +}
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
> index 174f83137a49..d1e49f701c81 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.h
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
> @@ -36,6 +36,7 @@ struct msm_rbmemptrs {
>  
>  	volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
>  	volatile u64 ttbr0;
> +	volatile u32 context_idr;
>  };
>  
>  struct msm_cp_state {
> @@ -101,6 +102,12 @@ struct msm_ringbuffer {
>  	 */
>  	spinlock_t preempt_lock;
>  
> +	/*
> +	 * Whether we skipped writing wptr and it needs to be updated in the
> +	 * future when the ring becomes current.
> +	 */
> +	bool restore_wptr;
> +
>  	/**
>  	 * cur_ctx_seqno:
>  	 *
> 
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 06/11] drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (4 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-17 11:14 ` [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption Antonino Maniscalco
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco

In mesa CP_SET_CTXSWITCH_IB is renamed to CP_SET_AMBLE and some other
names are changed to match KGSL. Import those changes.

The changes have not been merged yet in mesa but are necessary for this
series.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
 .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    | 39 ++++++++++------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml b/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml
index cab01af55d22..55a35182858c 100644
--- a/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml
+++ b/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml
@@ -581,8 +581,7 @@ xsi:schemaLocation="https://gitlab.freedesktop.org/freedreno/ rules-fd.xsd">
                 and forcibly switch to the indicated context.
 	</doc>
 	<value name="CP_CONTEXT_SWITCH" value="0x54" variants="A6XX"/>
-	<!-- Note, kgsl calls this CP_SET_AMBLE: -->
-	<value name="CP_SET_CTXSWITCH_IB" value="0x55" variants="A6XX-"/>
+	<value name="CP_SET_AMBLE" value="0x55" variants="A6XX-"/>
 
 	<!--
 	Seems to always have the payload:
@@ -2013,42 +2012,38 @@ opcode: CP_LOAD_STATE4 (30) (4 dwords)
 	</reg32>
 </domain>
 
-<domain name="CP_SET_CTXSWITCH_IB" width="32">
+<domain name="CP_SET_AMBLE" width="32">
 	<doc>
-                Used by the userspace driver to set various IB's which are
-                executed during context save/restore for handling
-                state that isn't restored by the
-                context switch routine itself.
-	</doc>
-	<enum name="ctxswitch_ib">
-		<value name="RESTORE_IB" value="0">
+                Used by the userspace and kernel drivers to set various IB's
+                which are executed during context save/restore for handling
+                state that isn't restored by the context switch routine itself.
+  </doc>
+	<enum name="amble_type">
+		<value name="PREAMBLE_AMBLE_TYPE" value="0">
 			<doc>Executed unconditionally when switching back to the context.</doc>
 		</value>
-		<value name="YIELD_RESTORE_IB" value="1">
+		<value name="BIN_PREAMBLE_AMBLE_TYPE" value="1">
                         <doc>
 				Executed when switching back after switching
 				away during execution of
-				a CP_SET_MARKER packet with RM6_YIELD as the
-				payload *and* the normal save routine was
-				bypassed for a shorter one. I think this is
-				connected to the "skipsaverestore" bit set by
-				the kernel when preempting.
+				a CP_SET_MARKER packet with RM6_BIN_RENDER_END as the
+				payload *and* skipsaverestore is set. This is
+				expected to restore static register values not
+				saved when skipsaverestore is set.
 			</doc>
 		</value>
-		<value name="SAVE_IB" value="2">
+		<value name="POSTAMBLE_AMBLE_TYPE" value="2">
                         <doc>
 				Executed when switching away from the context,
 				except for context switches initiated via
 				CP_YIELD.
                         </doc>
 		</value>
-		<value name="RB_SAVE_IB" value="3">
+		<value name="KMD_AMBLE_TYPE" value="3">
 			<doc>
 				This can only be set by the RB (i.e. the kernel)
 				and executes with protected mode off, but
-				is otherwise similar to SAVE_IB.
-
-				Note, kgsl calls this CP_KMD_AMBLE_TYPE
+				is otherwise similar to POSTAMBLE_AMBLE_TYPE.
 			</doc>
 		</value>
 	</enum>
@@ -2060,7 +2055,7 @@ opcode: CP_LOAD_STATE4 (30) (4 dwords)
 	</reg32>
 	<reg32 offset="2" name="2">
 		<bitfield name="DWORDS" low="0" high="19" type="uint"/>
-		<bitfield name="TYPE" low="20" high="21" type="ctxswitch_ib"/>
+		<bitfield name="TYPE" low="20" high="21" type="amble_type"/>
 	</reg32>
 </domain>
 

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (5 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 06/11] drm/msm/A6xx: Sync relevant adreno_pm4.xml changes Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-20 16:43   ` Akhil P Oommen
  2024-09-17 11:14 ` [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption Antonino Maniscalco
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco

Use the postamble to reset perf counters when switching between rings,
except when sysprof is enabled, analogously to how they are reset
between submissions when switching pagetables.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 12 +++++++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |  6 ++++
 drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 57 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 ++--
 4 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 355a3e210335..736f475d696f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -358,6 +358,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
 		struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
 {
+	u64 preempt_postamble;
+
 	OUT_PKT7(ring, CP_SET_PSEUDO_REG, 12);
 
 	OUT_RING(ring, SMMU_INFO);
@@ -381,6 +383,16 @@ static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
 	/* seems OK to set to 0 to disable it */
 	OUT_RING(ring, 0);
 	OUT_RING(ring, 0);
+
+	/* Emit postamble to clear perfcounters */
+	preempt_postamble = a6xx_gpu->preempt_postamble_iova;
+
+	OUT_PKT7(ring, CP_SET_AMBLE, 3);
+	OUT_RING(ring, lower_32_bits(preempt_postamble));
+	OUT_RING(ring, upper_32_bits(preempt_postamble));
+	OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
+				 a6xx_gpu->preempt_postamble_len) |
+			 CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
 }
 
 static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 7fc994121676..ae13892c87e3 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -71,6 +71,12 @@ struct a6xx_gpu {
 	bool uses_gmem;
 	bool skip_save_restore;
 
+	struct drm_gem_object *preempt_postamble_bo;
+	void *preempt_postamble_ptr;
+	uint64_t preempt_postamble_iova;
+	uint64_t preempt_postamble_len;
+	bool postamble_enabled;
+
 	struct a6xx_gmu gmu;
 
 	struct drm_gem_object *shadow_bo;
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
index aa4bad394f9e..77c4d5e91854 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
@@ -97,6 +97,43 @@ static void a6xx_preempt_timer(struct timer_list *t)
 	kthread_queue_work(gpu->worker, &gpu->recover_work);
 }
 
+static void preempt_prepare_postamble(struct a6xx_gpu *a6xx_gpu)
+{
+	u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
+	u32 count = 0;
+
+	postamble[count++] = PKT7(CP_REG_RMW, 3);
+	postamble[count++] = REG_A6XX_RBBM_PERFCTR_SRAM_INIT_CMD;
+	postamble[count++] = 0;
+	postamble[count++] = 1;
+
+	postamble[count++] = PKT7(CP_WAIT_REG_MEM, 6);
+	postamble[count++] = CP_WAIT_REG_MEM_0_FUNCTION(WRITE_EQ);
+	postamble[count++] = CP_WAIT_REG_MEM_1_POLL_ADDR_LO(
+				REG_A6XX_RBBM_PERFCTR_SRAM_INIT_STATUS);
+	postamble[count++] = CP_WAIT_REG_MEM_2_POLL_ADDR_HI(0);
+	postamble[count++] = CP_WAIT_REG_MEM_3_REF(0x1);
+	postamble[count++] = CP_WAIT_REG_MEM_4_MASK(0x1);
+	postamble[count++] = CP_WAIT_REG_MEM_5_DELAY_LOOP_CYCLES(0);
+
+	a6xx_gpu->preempt_postamble_len = count;
+
+	a6xx_gpu->postamble_enabled = true;
+}
+
+static void preempt_disable_postamble(struct a6xx_gpu *a6xx_gpu)
+{
+	u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
+
+	/*
+	 * Disable the postamble by replacing the first packet header with a NOP
+	 * that covers the whole buffer.
+	 */
+	*postamble = PKT7(CP_NOP, (a6xx_gpu->preempt_postamble_len - 1));
+
+	a6xx_gpu->postamble_enabled = false;
+}
+
 void a6xx_preempt_irq(struct msm_gpu *gpu)
 {
 	uint32_t status;
@@ -187,6 +224,7 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
 	unsigned long flags;
 	struct msm_ringbuffer *ring;
 	unsigned int cntl;
+	bool sysprof;
 
 	if (gpu->nr_rings == 1)
 		return;
@@ -272,6 +310,15 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
 	/* Start a timer to catch a stuck preemption */
 	mod_timer(&a6xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(10000));
 
+	/* Enable or disable postamble as needed */
+	sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
+
+	if (!sysprof && !a6xx_gpu->postamble_enabled)
+		preempt_prepare_postamble(a6xx_gpu);
+
+	if (sysprof && a6xx_gpu->postamble_enabled)
+		preempt_disable_postamble(a6xx_gpu);
+
 	/* Set the preemption state to triggered */
 	set_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED);
 
@@ -359,6 +406,16 @@ void a6xx_preempt_init(struct msm_gpu *gpu)
 	a6xx_gpu->uses_gmem = 1;
 	a6xx_gpu->skip_save_restore = 1;
 
+	a6xx_gpu->preempt_postamble_ptr  = msm_gem_kernel_new(gpu->dev,
+			PAGE_SIZE, MSM_BO_WC | MSM_BO_MAP_PRIV,
+			gpu->aspace, &a6xx_gpu->preempt_postamble_bo,
+			&a6xx_gpu->preempt_postamble_iova);
+
+	preempt_prepare_postamble(a6xx_gpu);
+
+	if (IS_ERR(a6xx_gpu->preempt_postamble_ptr))
+		goto fail;
+
 	timer_setup(&a6xx_gpu->preempt_timer, a6xx_preempt_timer, 0);
 
 	return;
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 6b1888280a83..87098567483b 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -610,12 +610,15 @@ OUT_PKT4(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
 	OUT_RING(ring, PKT4(regindx, cnt));
 }
 
+#define PKT7(opcode, cnt) \
+	(CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) | \
+		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23))
+
 static inline void
 OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 {
 	adreno_wait_ring(ring, cnt + 1);
-	OUT_RING(ring, CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) |
-		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23));
+	OUT_RING(ring, PKT7(opcode, cnt));
 }
 
 struct msm_gpu *a2xx_gpu_init(struct drm_device *dev);

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption
  2024-09-17 11:14 ` [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption Antonino Maniscalco
@ 2024-09-20 16:43   ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 16:43 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc

On Tue, Sep 17, 2024 at 01:14:17PM +0200, Antonino Maniscalco wrote:
> Use the postamble to reset perf counters when switching between rings,
> except when sysprof is enabled, analogously to how they are reset
> between submissions when switching pagetables.
> 
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 12 +++++++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |  6 ++++
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 57 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 ++--
>  4 files changed, 80 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 355a3e210335..736f475d696f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -358,6 +358,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
>  		struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
>  {
> +	u64 preempt_postamble;
> +
>  	OUT_PKT7(ring, CP_SET_PSEUDO_REG, 12);
>  
>  	OUT_RING(ring, SMMU_INFO);
> @@ -381,6 +383,16 @@ static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
>  	/* seems OK to set to 0 to disable it */
>  	OUT_RING(ring, 0);
>  	OUT_RING(ring, 0);
> +
> +	/* Emit postamble to clear perfcounters */
> +	preempt_postamble = a6xx_gpu->preempt_postamble_iova;
> +
> +	OUT_PKT7(ring, CP_SET_AMBLE, 3);
> +	OUT_RING(ring, lower_32_bits(preempt_postamble));
> +	OUT_RING(ring, upper_32_bits(preempt_postamble));
> +	OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> +				 a6xx_gpu->preempt_postamble_len) |
> +			 CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
>  }
>  
>  static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 7fc994121676..ae13892c87e3 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -71,6 +71,12 @@ struct a6xx_gpu {
>  	bool uses_gmem;
>  	bool skip_save_restore;
>  
> +	struct drm_gem_object *preempt_postamble_bo;
> +	void *preempt_postamble_ptr;
> +	uint64_t preempt_postamble_iova;
> +	uint64_t preempt_postamble_len;
> +	bool postamble_enabled;
> +
>  	struct a6xx_gmu gmu;
>  
>  	struct drm_gem_object *shadow_bo;
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index aa4bad394f9e..77c4d5e91854 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -97,6 +97,43 @@ static void a6xx_preempt_timer(struct timer_list *t)
>  	kthread_queue_work(gpu->worker, &gpu->recover_work);
>  }
>  
> +static void preempt_prepare_postamble(struct a6xx_gpu *a6xx_gpu)
> +{
> +	u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
> +	u32 count = 0;
> +
> +	postamble[count++] = PKT7(CP_REG_RMW, 3);
> +	postamble[count++] = REG_A6XX_RBBM_PERFCTR_SRAM_INIT_CMD;
> +	postamble[count++] = 0;
> +	postamble[count++] = 1;
> +
> +	postamble[count++] = PKT7(CP_WAIT_REG_MEM, 6);
> +	postamble[count++] = CP_WAIT_REG_MEM_0_FUNCTION(WRITE_EQ);
> +	postamble[count++] = CP_WAIT_REG_MEM_1_POLL_ADDR_LO(
> +				REG_A6XX_RBBM_PERFCTR_SRAM_INIT_STATUS);
> +	postamble[count++] = CP_WAIT_REG_MEM_2_POLL_ADDR_HI(0);
> +	postamble[count++] = CP_WAIT_REG_MEM_3_REF(0x1);
> +	postamble[count++] = CP_WAIT_REG_MEM_4_MASK(0x1);
> +	postamble[count++] = CP_WAIT_REG_MEM_5_DELAY_LOOP_CYCLES(0);
> +
> +	a6xx_gpu->preempt_postamble_len = count;
> +
> +	a6xx_gpu->postamble_enabled = true;
> +}
> +
> +static void preempt_disable_postamble(struct a6xx_gpu *a6xx_gpu)
> +{
> +	u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
> +
> +	/*
> +	 * Disable the postamble by replacing the first packet header with a NOP
> +	 * that covers the whole buffer.
> +	 */
> +	*postamble = PKT7(CP_NOP, (a6xx_gpu->preempt_postamble_len - 1));
> +
> +	a6xx_gpu->postamble_enabled = false;
> +}
> +
>  void a6xx_preempt_irq(struct msm_gpu *gpu)
>  {
>  	uint32_t status;
> @@ -187,6 +224,7 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>  	unsigned long flags;
>  	struct msm_ringbuffer *ring;
>  	unsigned int cntl;
> +	bool sysprof;
>  
>  	if (gpu->nr_rings == 1)
>  		return;
> @@ -272,6 +310,15 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>  	/* Start a timer to catch a stuck preemption */
>  	mod_timer(&a6xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(10000));
>  
> +	/* Enable or disable postamble as needed */
> +	sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
> +
> +	if (!sysprof && !a6xx_gpu->postamble_enabled)
> +		preempt_prepare_postamble(a6xx_gpu);
> +
> +	if (sysprof && a6xx_gpu->postamble_enabled)
> +		preempt_disable_postamble(a6xx_gpu);
> +
>  	/* Set the preemption state to triggered */
>  	set_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED);
>  
> @@ -359,6 +406,16 @@ void a6xx_preempt_init(struct msm_gpu *gpu)
>  	a6xx_gpu->uses_gmem = 1;
>  	a6xx_gpu->skip_save_restore = 1;
>  
> +	a6xx_gpu->preempt_postamble_ptr  = msm_gem_kernel_new(gpu->dev,
> +			PAGE_SIZE, MSM_BO_WC | MSM_BO_MAP_PRIV,
> +			gpu->aspace, &a6xx_gpu->preempt_postamble_bo,
> +			&a6xx_gpu->preempt_postamble_iova);
> +
> +	preempt_prepare_postamble(a6xx_gpu);
> +
> +	if (IS_ERR(a6xx_gpu->preempt_postamble_ptr))
> +		goto fail;
> +
>  	timer_setup(&a6xx_gpu->preempt_timer, a6xx_preempt_timer, 0);
>  
>  	return;
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 6b1888280a83..87098567483b 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -610,12 +610,15 @@ OUT_PKT4(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
>  	OUT_RING(ring, PKT4(regindx, cnt));
>  }
>  
> +#define PKT7(opcode, cnt) \
> +	(CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) | \
> +		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23))
> +
>  static inline void
>  OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
>  {
>  	adreno_wait_ring(ring, cnt + 1);
> -	OUT_RING(ring, CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) |
> -		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23));
> +	OUT_RING(ring, PKT7(opcode, cnt));
>  }
>  
>  struct msm_gpu *a2xx_gpu_init(struct drm_device *dev);
> 
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (6 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-20 16:50   ` Akhil P Oommen
  2024-09-17 11:14 ` [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create Antonino Maniscalco
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Neil Armstrong

Add trace points corresponding to preemption being triggered and being
completed for latency measurement purposes.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
---
 drivers/gpu/drm/msm/adreno/a6xx_preempt.c |  6 ++++++
 drivers/gpu/drm/msm/msm_gpu_trace.h       | 28 ++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
index 77c4d5e91854..4fbc66d6860a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
@@ -7,6 +7,7 @@
 #include "a6xx_gpu.h"
 #include "a6xx_gmu.xml.h"
 #include "msm_mmu.h"
+#include "msm_gpu_trace.h"
 
 /*
  * Try to transition the preemption state from old to new. Return
@@ -174,6 +175,8 @@ void a6xx_preempt_irq(struct msm_gpu *gpu)
 
 	set_preempt_state(a6xx_gpu, PREEMPT_NONE);
 
+	trace_msm_gpu_preemption_irq(a6xx_gpu->cur_ring->id);
+
 	/*
 	 * Retrigger preemption to avoid a deadlock that might occur when preemption
 	 * is skipped due to it being already in flight when requested.
@@ -295,6 +298,9 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
 	 */
 	ring->restore_wptr = false;
 
+	trace_msm_gpu_preemption_trigger(a6xx_gpu->cur_ring->id,
+		ring ? ring->id : -1);
+
 	spin_unlock_irqrestore(&ring->preempt_lock, flags);
 
 	gpu_write64(gpu,
diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h b/drivers/gpu/drm/msm/msm_gpu_trace.h
index ac40d857bc45..7f863282db0d 100644
--- a/drivers/gpu/drm/msm/msm_gpu_trace.h
+++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
@@ -177,6 +177,34 @@ TRACE_EVENT(msm_gpu_resume,
 		TP_printk("%u", __entry->dummy)
 );
 
+TRACE_EVENT(msm_gpu_preemption_trigger,
+		TP_PROTO(int ring_id_from, int ring_id_to),
+		TP_ARGS(ring_id_from, ring_id_to),
+		TP_STRUCT__entry(
+			__field(int, ring_id_from)
+			__field(int, ring_id_to)
+			),
+		TP_fast_assign(
+			__entry->ring_id_from = ring_id_from;
+			__entry->ring_id_to = ring_id_to;
+			),
+		TP_printk("preempting %u -> %u",
+			  __entry->ring_id_from,
+			  __entry->ring_id_to)
+);
+
+TRACE_EVENT(msm_gpu_preemption_irq,
+		TP_PROTO(u32 ring_id),
+		TP_ARGS(ring_id),
+		TP_STRUCT__entry(
+			__field(u32, ring_id)
+			),
+		TP_fast_assign(
+			__entry->ring_id = ring_id;
+			),
+		TP_printk("preempted to %u", __entry->ring_id)
+);
+
 #endif
 
 #undef TRACE_INCLUDE_PATH

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption
  2024-09-17 11:14 ` [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption Antonino Maniscalco
@ 2024-09-20 16:50   ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 16:50 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Neil Armstrong

On Tue, Sep 17, 2024 at 01:14:18PM +0200, Antonino Maniscalco wrote:
> Add trace points corresponding to preemption being triggered and being
> completed for latency measurement purposes.
> 
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c |  6 ++++++
>  drivers/gpu/drm/msm/msm_gpu_trace.h       | 28 ++++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index 77c4d5e91854..4fbc66d6860a 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -7,6 +7,7 @@
>  #include "a6xx_gpu.h"
>  #include "a6xx_gmu.xml.h"
>  #include "msm_mmu.h"
> +#include "msm_gpu_trace.h"
>  
>  /*
>   * Try to transition the preemption state from old to new. Return
> @@ -174,6 +175,8 @@ void a6xx_preempt_irq(struct msm_gpu *gpu)
>  
>  	set_preempt_state(a6xx_gpu, PREEMPT_NONE);
>  
> +	trace_msm_gpu_preemption_irq(a6xx_gpu->cur_ring->id);
> +
>  	/*
>  	 * Retrigger preemption to avoid a deadlock that might occur when preemption
>  	 * is skipped due to it being already in flight when requested.
> @@ -295,6 +298,9 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>  	 */
>  	ring->restore_wptr = false;
>  
> +	trace_msm_gpu_preemption_trigger(a6xx_gpu->cur_ring->id,
> +		ring ? ring->id : -1);
> +
>  	spin_unlock_irqrestore(&ring->preempt_lock, flags);
>  
>  	gpu_write64(gpu,
> diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h b/drivers/gpu/drm/msm/msm_gpu_trace.h
> index ac40d857bc45..7f863282db0d 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_trace.h
> +++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
> @@ -177,6 +177,34 @@ TRACE_EVENT(msm_gpu_resume,
>  		TP_printk("%u", __entry->dummy)
>  );
>  
> +TRACE_EVENT(msm_gpu_preemption_trigger,
> +		TP_PROTO(int ring_id_from, int ring_id_to),
> +		TP_ARGS(ring_id_from, ring_id_to),
> +		TP_STRUCT__entry(
> +			__field(int, ring_id_from)
> +			__field(int, ring_id_to)
> +			),
> +		TP_fast_assign(
> +			__entry->ring_id_from = ring_id_from;
> +			__entry->ring_id_to = ring_id_to;
> +			),
> +		TP_printk("preempting %u -> %u",
> +			  __entry->ring_id_from,
> +			  __entry->ring_id_to)
> +);
> +
> +TRACE_EVENT(msm_gpu_preemption_irq,
> +		TP_PROTO(u32 ring_id),
> +		TP_ARGS(ring_id),
> +		TP_STRUCT__entry(
> +			__field(u32, ring_id)
> +			),
> +		TP_fast_assign(
> +			__entry->ring_id = ring_id;
> +			),
> +		TP_printk("preempted to %u", __entry->ring_id)
> +);
> +
>  #endif
>  
>  #undef TRACE_INCLUDE_PATH
> 
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (7 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-20 16:54   ` Akhil P Oommen
  2024-09-17 11:14 ` [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750 Antonino Maniscalco
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Neil Armstrong

Some userspace changes are necessary so add a flag for userspace to
advertise support for preemption when creating the submitqueue.

When this flag is not set preemption will not be allowed in the middle
of the submitted IBs therefore mantaining compatibility with older
userspace.

The flag is rejected if preemption is not supported on the target, this
allows userspace to know whether preemption is supported.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++++++++----
 drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
 include/uapi/drm/msm_drm.h            |  5 ++++-
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 736f475d696f..edbcb6d229ba 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	OUT_PKT7(ring, CP_SET_MARKER, 1);
 	OUT_RING(ring, 0x101); /* IFPC disable */
 
-	OUT_PKT7(ring, CP_SET_MARKER, 1);
-	OUT_RING(ring, 0x00d); /* IB1LIST start */
+	if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
+		OUT_PKT7(ring, CP_SET_MARKER, 1);
+		OUT_RING(ring, 0x00d); /* IB1LIST start */
+	}
 
 	/* Submit the commands */
 	for (i = 0; i < submit->nr_cmds; i++) {
@@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 			update_shadow_rptr(gpu, ring);
 	}
 
-	OUT_PKT7(ring, CP_SET_MARKER, 1);
-	OUT_RING(ring, 0x00e); /* IB1LIST end */
+	if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
+		OUT_PKT7(ring, CP_SET_MARKER, 1);
+		OUT_RING(ring, 0x00e); /* IB1LIST end */
+	}
 
 	get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
 		rbmemptr_stats(ring, index, cpcycles_end));
diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c
index 0e803125a325..9b3ffca3f3b4 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c
@@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
 	if (!priv->gpu)
 		return -ENODEV;
 
+	if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 1)
+		return -EINVAL;
+
 	ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, &sched_prio);
 	if (ret)
 		return ret;
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 3fca72f73861..f37858db34e6 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
  * backwards compatibility as a "default" submitqueue
  */
 
-#define MSM_SUBMITQUEUE_FLAGS (0)
+#define MSM_SUBMITQUEUE_ALLOW_PREEMPT	0x00000001
+#define MSM_SUBMITQUEUE_FLAGS		    ( \
+		MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
+		0)
 
 /*
  * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
  2024-09-17 11:14 ` [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create Antonino Maniscalco
@ 2024-09-20 16:54   ` Akhil P Oommen
  2024-09-20 17:29     ` Rob Clark
  0 siblings, 1 reply; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 16:54 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Neil Armstrong

On Tue, Sep 17, 2024 at 01:14:19PM +0200, Antonino Maniscalco wrote:
> Some userspace changes are necessary so add a flag for userspace to
> advertise support for preemption when creating the submitqueue.
> 
> When this flag is not set preemption will not be allowed in the middle
> of the submitted IBs therefore mantaining compatibility with older
> userspace.
> 
> The flag is rejected if preemption is not supported on the target, this
> allows userspace to know whether preemption is supported.

Just curious, what is the motivation behind informing userspace about
preemption support?

-Akhil

> 
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++++++++----
>  drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
>  include/uapi/drm/msm_drm.h            |  5 ++++-
>  3 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 736f475d696f..edbcb6d229ba 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  	OUT_PKT7(ring, CP_SET_MARKER, 1);
>  	OUT_RING(ring, 0x101); /* IFPC disable */
>  
> -	OUT_PKT7(ring, CP_SET_MARKER, 1);
> -	OUT_RING(ring, 0x00d); /* IB1LIST start */
> +	if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> +		OUT_PKT7(ring, CP_SET_MARKER, 1);
> +		OUT_RING(ring, 0x00d); /* IB1LIST start */
> +	}
>  
>  	/* Submit the commands */
>  	for (i = 0; i < submit->nr_cmds; i++) {
> @@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  			update_shadow_rptr(gpu, ring);
>  	}
>  
> -	OUT_PKT7(ring, CP_SET_MARKER, 1);
> -	OUT_RING(ring, 0x00e); /* IB1LIST end */
> +	if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> +		OUT_PKT7(ring, CP_SET_MARKER, 1);
> +		OUT_RING(ring, 0x00e); /* IB1LIST end */
> +	}
>  
>  	get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
>  		rbmemptr_stats(ring, index, cpcycles_end));
> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c
> index 0e803125a325..9b3ffca3f3b4 100644
> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> @@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
>  	if (!priv->gpu)
>  		return -ENODEV;
>  
> +	if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 1)
> +		return -EINVAL;
> +
>  	ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, &sched_prio);
>  	if (ret)
>  		return ret;
> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> index 3fca72f73861..f37858db34e6 100644
> --- a/include/uapi/drm/msm_drm.h
> +++ b/include/uapi/drm/msm_drm.h
> @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
>   * backwards compatibility as a "default" submitqueue
>   */
>  
> -#define MSM_SUBMITQUEUE_FLAGS (0)
> +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT	0x00000001
> +#define MSM_SUBMITQUEUE_FLAGS		    ( \
> +		MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> +		0)
>  
>  /*
>   * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,
> 
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
  2024-09-20 16:54   ` Akhil P Oommen
@ 2024-09-20 17:29     ` Rob Clark
  2024-09-23 19:44       ` Akhil P Oommen
  0 siblings, 1 reply; 35+ messages in thread
From: Rob Clark @ 2024-09-20 17:29 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Antonino Maniscalco, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Neil Armstrong

On Fri, Sep 20, 2024 at 9:54 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On Tue, Sep 17, 2024 at 01:14:19PM +0200, Antonino Maniscalco wrote:
> > Some userspace changes are necessary so add a flag for userspace to
> > advertise support for preemption when creating the submitqueue.
> >
> > When this flag is not set preemption will not be allowed in the middle
> > of the submitted IBs therefore mantaining compatibility with older
> > userspace.
> >
> > The flag is rejected if preemption is not supported on the target, this
> > allows userspace to know whether preemption is supported.
>
> Just curious, what is the motivation behind informing userspace about
> preemption support?

I think I requested that, as a "just in case" (because it would
otherwise be awkward if we later needed to know the difference btwn
drm/sched "preemption" which can only happen before submit is written
to ring and "real" preemption)

BR,
-R

> -Akhil
>
> >
> > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++++++++----
> >  drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
> >  include/uapi/drm/msm_drm.h            |  5 ++++-
> >  3 files changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 736f475d696f..edbcb6d229ba 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >       OUT_PKT7(ring, CP_SET_MARKER, 1);
> >       OUT_RING(ring, 0x101); /* IFPC disable */
> >
> > -     OUT_PKT7(ring, CP_SET_MARKER, 1);
> > -     OUT_RING(ring, 0x00d); /* IB1LIST start */
> > +     if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > +             OUT_PKT7(ring, CP_SET_MARKER, 1);
> > +             OUT_RING(ring, 0x00d); /* IB1LIST start */
> > +     }
> >
> >       /* Submit the commands */
> >       for (i = 0; i < submit->nr_cmds; i++) {
> > @@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >                       update_shadow_rptr(gpu, ring);
> >       }
> >
> > -     OUT_PKT7(ring, CP_SET_MARKER, 1);
> > -     OUT_RING(ring, 0x00e); /* IB1LIST end */
> > +     if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > +             OUT_PKT7(ring, CP_SET_MARKER, 1);
> > +             OUT_RING(ring, 0x00e); /* IB1LIST end */
> > +     }
> >
> >       get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
> >               rbmemptr_stats(ring, index, cpcycles_end));
> > diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c
> > index 0e803125a325..9b3ffca3f3b4 100644
> > --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> > +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> > @@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
> >       if (!priv->gpu)
> >               return -ENODEV;
> >
> > +     if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 1)
> > +             return -EINVAL;
> > +
> >       ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, &sched_prio);
> >       if (ret)
> >               return ret;
> > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> > index 3fca72f73861..f37858db34e6 100644
> > --- a/include/uapi/drm/msm_drm.h
> > +++ b/include/uapi/drm/msm_drm.h
> > @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
> >   * backwards compatibility as a "default" submitqueue
> >   */
> >
> > -#define MSM_SUBMITQUEUE_FLAGS (0)
> > +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT        0x00000001
> > +#define MSM_SUBMITQUEUE_FLAGS                    ( \
> > +             MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> > +             0)
> >
> >  /*
> >   * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,
> >
> > --
> > 2.46.0
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
  2024-09-20 17:29     ` Rob Clark
@ 2024-09-23 19:44       ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-23 19:44 UTC (permalink / raw)
  To: Rob Clark
  Cc: Antonino Maniscalco, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Neil Armstrong

On Fri, Sep 20, 2024 at 10:29:44AM -0700, Rob Clark wrote:
> On Fri, Sep 20, 2024 at 9:54 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> >
> > On Tue, Sep 17, 2024 at 01:14:19PM +0200, Antonino Maniscalco wrote:
> > > Some userspace changes are necessary so add a flag for userspace to
> > > advertise support for preemption when creating the submitqueue.
> > >
> > > When this flag is not set preemption will not be allowed in the middle
> > > of the submitted IBs therefore mantaining compatibility with older
> > > userspace.
> > >
> > > The flag is rejected if preemption is not supported on the target, this
> > > allows userspace to know whether preemption is supported.
> >
> > Just curious, what is the motivation behind informing userspace about
> > preemption support?
> 
> I think I requested that, as a "just in case" (because it would
> otherwise be awkward if we later needed to know the difference btwn
> drm/sched "preemption" which can only happen before submit is written
> to ring and "real" preemption)

Thanks. That makes sense.

-Akhil

> 
> BR,
> -R
> 
> > -Akhil
> >
> > >
> > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++++++++----
> > >  drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
> > >  include/uapi/drm/msm_drm.h            |  5 ++++-
> > >  3 files changed, 15 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 736f475d696f..edbcb6d229ba 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> > >       OUT_PKT7(ring, CP_SET_MARKER, 1);
> > >       OUT_RING(ring, 0x101); /* IFPC disable */
> > >
> > > -     OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > -     OUT_RING(ring, 0x00d); /* IB1LIST start */
> > > +     if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > > +             OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > +             OUT_RING(ring, 0x00d); /* IB1LIST start */
> > > +     }
> > >
> > >       /* Submit the commands */
> > >       for (i = 0; i < submit->nr_cmds; i++) {
> > > @@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> > >                       update_shadow_rptr(gpu, ring);
> > >       }
> > >
> > > -     OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > -     OUT_RING(ring, 0x00e); /* IB1LIST end */
> > > +     if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > > +             OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > +             OUT_RING(ring, 0x00e); /* IB1LIST end */
> > > +     }
> > >
> > >       get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
> > >               rbmemptr_stats(ring, index, cpcycles_end));
> > > diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c
> > > index 0e803125a325..9b3ffca3f3b4 100644
> > > --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> > > +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> > > @@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
> > >       if (!priv->gpu)
> > >               return -ENODEV;
> > >
> > > +     if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 1)
> > > +             return -EINVAL;
> > > +
> > >       ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, &sched_prio);
> > >       if (ret)
> > >               return ret;
> > > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> > > index 3fca72f73861..f37858db34e6 100644
> > > --- a/include/uapi/drm/msm_drm.h
> > > +++ b/include/uapi/drm/msm_drm.h
> > > @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
> > >   * backwards compatibility as a "default" submitqueue
> > >   */
> > >
> > > -#define MSM_SUBMITQUEUE_FLAGS (0)
> > > +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT        0x00000001
> > > +#define MSM_SUBMITQUEUE_FLAGS                    ( \
> > > +             MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> > > +             0)
> > >
> > >  /*
> > >   * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,
> > >
> > > --
> > > 2.46.0
> > >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (8 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-20 17:02   ` Akhil P Oommen
  2024-09-17 11:14 ` [PATCH v4 11/11] Documentation: document adreno preemption Antonino Maniscalco
  2024-09-18  7:46 ` [PATCH v4 00/11] Preemption support for A7XX Neil Armstrong
  11 siblings, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco, Neil Armstrong

Initialize with 4 rings to enable preemption.

For now only on A750 as other targets require testing.

Add the "preemption_enabled" module parameter to override this for other
A7xx targets.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 3 ++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 6 +++++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   | 1 +
 drivers/gpu/drm/msm/msm_drv.c             | 4 ++++
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 316f23ca9167..0e3041b29419 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1240,7 +1240,8 @@ static const struct adreno_info a7xx_gpus[] = {
 		.gmem = 3 * SZ_1M,
 		.inactive_period = DRM_MSM_INACTIVE_PERIOD,
 		.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
-			  ADRENO_QUIRK_HAS_HW_APRIV,
+			  ADRENO_QUIRK_HAS_HW_APRIV |
+			  ADRENO_QUIRK_PREEMPTION,
 		.init = a6xx_gpu_init,
 		.zapfw = "gen70900_zap.mbn",
 		.a6xx = &(const struct a6xx_info) {
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index edbcb6d229ba..4760f9469613 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2529,6 +2529,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	struct a6xx_gpu *a6xx_gpu;
 	struct adreno_gpu *adreno_gpu;
 	struct msm_gpu *gpu;
+	extern int enable_preemption;
 	bool is_a7xx;
 	int ret;
 
@@ -2567,7 +2568,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 		return ERR_PTR(ret);
 	}
 
-	if (is_a7xx)
+	if ((enable_preemption == 1) || (enable_preemption == -1 &&
+	    (config->info->quirks & ADRENO_QUIRK_PREEMPTION)))
+		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 4);
+	else if (is_a7xx)
 		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1);
 	else if (adreno_has_gmu_wrapper(adreno_gpu))
 		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 87098567483b..d1cd53f05de6 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -56,6 +56,7 @@ enum adreno_family {
 #define ADRENO_QUIRK_LMLOADKILL_DISABLE		BIT(2)
 #define ADRENO_QUIRK_HAS_HW_APRIV		BIT(3)
 #define ADRENO_QUIRK_HAS_CACHED_COHERENT	BIT(4)
+#define ADRENO_QUIRK_PREEMPTION			BIT(5)
 
 /* Helper for formating the chip_id in the way that userspace tools like
  * crashdec expect.
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 9c33f4e3f822..7c64b20f5e3b 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -58,6 +58,10 @@ static bool modeset = true;
 MODULE_PARM_DESC(modeset, "Use kernel modesetting [KMS] (1=on (default), 0=disable)");
 module_param(modeset, bool, 0600);
 
+int enable_preemption = -1;
+MODULE_PARM_DESC(enable_preemption, "Enable preemption (A7xx only) (1=on , 0=disable, -1=auto (default))");
+module_param(enable_preemption, int, 0600);
+
 #ifdef CONFIG_FAULT_INJECTION
 DECLARE_FAULT_ATTR(fail_gem_alloc);
 DECLARE_FAULT_ATTR(fail_gem_iova);

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750
  2024-09-17 11:14 ` [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750 Antonino Maniscalco
@ 2024-09-20 17:02   ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 17:02 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Neil Armstrong

On Tue, Sep 17, 2024 at 01:14:20PM +0200, Antonino Maniscalco wrote:
> Initialize with 4 rings to enable preemption.
> 
> For now only on A750 as other targets require testing.
> 
> Add the "preemption_enabled" module parameter to override this for other
> A7xx targets.
> 
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 3 ++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 6 +++++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   | 1 +
>  drivers/gpu/drm/msm/msm_drv.c             | 4 ++++
>  4 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> index 316f23ca9167..0e3041b29419 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> @@ -1240,7 +1240,8 @@ static const struct adreno_info a7xx_gpus[] = {
>  		.gmem = 3 * SZ_1M,
>  		.inactive_period = DRM_MSM_INACTIVE_PERIOD,
>  		.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
> -			  ADRENO_QUIRK_HAS_HW_APRIV,
> +			  ADRENO_QUIRK_HAS_HW_APRIV |
> +			  ADRENO_QUIRK_PREEMPTION,
>  		.init = a6xx_gpu_init,
>  		.zapfw = "gen70900_zap.mbn",
>  		.a6xx = &(const struct a6xx_info) {
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index edbcb6d229ba..4760f9469613 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2529,6 +2529,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	struct a6xx_gpu *a6xx_gpu;
>  	struct adreno_gpu *adreno_gpu;
>  	struct msm_gpu *gpu;
> +	extern int enable_preemption;
>  	bool is_a7xx;
>  	int ret;
>  
> @@ -2567,7 +2568,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  		return ERR_PTR(ret);
>  	}
>  
> -	if (is_a7xx)
> +	if ((enable_preemption == 1) || (enable_preemption == -1 &&
> +	    (config->info->quirks & ADRENO_QUIRK_PREEMPTION)))
> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 4);
> +	else if (is_a7xx)
>  		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1);
>  	else if (adreno_has_gmu_wrapper(adreno_gpu))
>  		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 87098567483b..d1cd53f05de6 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -56,6 +56,7 @@ enum adreno_family {
>  #define ADRENO_QUIRK_LMLOADKILL_DISABLE		BIT(2)
>  #define ADRENO_QUIRK_HAS_HW_APRIV		BIT(3)
>  #define ADRENO_QUIRK_HAS_CACHED_COHERENT	BIT(4)
> +#define ADRENO_QUIRK_PREEMPTION			BIT(5)
>  
>  /* Helper for formating the chip_id in the way that userspace tools like
>   * crashdec expect.
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 9c33f4e3f822..7c64b20f5e3b 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -58,6 +58,10 @@ static bool modeset = true;
>  MODULE_PARM_DESC(modeset, "Use kernel modesetting [KMS] (1=on (default), 0=disable)");
>  module_param(modeset, bool, 0600);
>  
> +int enable_preemption = -1;
> +MODULE_PARM_DESC(enable_preemption, "Enable preemption (A7xx only) (1=on , 0=disable, -1=auto (default))");
> +module_param(enable_preemption, int, 0600);
> +

Is adreno_device.c a better place for adreno specific module params?

-Akhil.

>  #ifdef CONFIG_FAULT_INJECTION
>  DECLARE_FAULT_ATTR(fail_gem_alloc);
>  DECLARE_FAULT_ATTR(fail_gem_iova);
> 
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 11/11] Documentation: document adreno preemption
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (9 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750 Antonino Maniscalco
@ 2024-09-17 11:14 ` Antonino Maniscalco
  2024-09-19 14:19   ` Connor Abbott
  2024-09-18  7:46 ` [PATCH v4 00/11] Preemption support for A7XX Neil Armstrong
  11 siblings, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-17 11:14 UTC (permalink / raw)
  To: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Antonino Maniscalco

Add documentation about the preemption feature supported by the msm
driver.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
 Documentation/gpu/msm-preemption.rst | 98 ++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/gpu/msm-preemption.rst b/Documentation/gpu/msm-preemption.rst
new file mode 100644
index 000000000000..c1203524da2e
--- /dev/null
+++ b/Documentation/gpu/msm-preemption.rst
@@ -0,0 +1,98 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+:orphan:
+
+=============
+MSM Preemtion
+=============
+
+Preemption allows Adreno GPUs to switch to an higher priority ring when work is
+pushed to it, reducing latency for high priority submissions.
+
+When preemption is enabled 4 rings are initialized, corresponding to different
+priority levels. Having multiple rings is purely a software concept as the GPU
+only has registers to keep track of one graphics ring.
+The kernel is able to switch which ring is currently being processed by
+requesting preemption. When certain conditions are met, depending on the
+priority level, the GPU will save its current state in a series of buffers,
+then restores state from a similar set of buffers specified by the kernel. It
+then resumes execution and fires an IRQ to let the kernel know the context
+switch has completed.
+
+This mechanism can be used by the kernel to switch between rings. Whenever a
+submission occurs the kernel finds the highest priority ring which isn't empty
+and preempts to it if said ring is not the one being currently executed. This is
+also done whenever a submission completes to make sure execution resumes on a
+lower priority ring when a higher priority ring is done.
+
+Preemption levels
+-----------------
+
+Preemption can only occur at certain boundaries. The exact conditions can be
+configured by changing the preemption level, this allows to compromise between
+latency (ie. the time that passes between when the kernel requests preemption
+and when the SQE begins saving state) and overhead (the amount of state that
+needs to be saved).
+
+The GPU offers 3 levels:
+
+Level 0
+  Preemption only occurs at the submission level. This requires the least amount
+  of state to be saved as the execution of userspace submitted IBs is never
+  interrupted, however it offers very little benefit compared to not enabling
+  preemption of any kind.
+
+Level 1
+  Preemption occurs at either bin level, if using GMEM rendering, or draw level
+  in the sysmem rendering case.
+
+Level 2
+  Preemption occurs at draw level.
+
+Level 1 is the mode that is used by the msm driver.
+
+Additionally the GPU allows to specify a `skip_save_restore` option. This
+disables the saving and restoring of certain registers which lowers the
+overhead. When enabling this userspace is expected to set the state that isn't
+preserved whenever preemption occurs which is done by specifying preamble and
+postambles. Those are IBs that are executed before and after
+preemption.
+
+Preemption buffers
+------------------
+
+A series of buffers are necessary to store the state of rings while they are not
+being executed. There are different kinds of preemption records and most of
+those require one buffer per ring. This is because preemption never occurs
+between submissions on the same ring, which always run in sequence when the ring
+is active. This means that only one context per ring is effectively active.
+
+SMMU_INFO
+  This buffer contains info about the current SMMU configuration such as the
+  ttbr0 register. The SQE firmware isn't actually able to save this record.
+  As a result SMMU info must be saved manually from the CP to a buffer and the
+  SMMU record updated with info from said buffer before triggering
+  preemption.
+
+NON_SECURE
+  This is the main preemption record where most state is saved. It is mostly
+  opaque to the kernel except for the first few words that must be initialized
+  by the kernel.
+
+SECURE
+  This saves state related to the GPU's secure mode.
+
+NON_PRIV
+  The intended purpose of this record is unknown. The SQE firmware actually
+  ignores it and therefore msm doesn't handle it.
+
+COUNTER
+  This record is used to save and restore performance counters.
+
+Handling the permissions of those buffers is critical for security. All but the
+NON_PRIV records need to be inaccessible from userspace, so they must be mapped
+in the kernel address space with the MSM_BO_MAP_PRIV flag.
+For example, making the NON_SECURE record accessible from userspace would allow
+any process to manipulate a saved ring's RPTR which can be used to skip the
+execution of some packets in a ring and execute user commands with higher
+privileges.

-- 
2.46.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 11/11] Documentation: document adreno preemption
  2024-09-17 11:14 ` [PATCH v4 11/11] Documentation: document adreno preemption Antonino Maniscalco
@ 2024-09-19 14:19   ` Connor Abbott
  0 siblings, 0 replies; 35+ messages in thread
From: Connor Abbott @ 2024-09-19 14:19 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc

On Tue, Sep 17, 2024 at 12:14 PM Antonino Maniscalco
<antomani103@gmail.com> wrote:
>
> Add documentation about the preemption feature supported by the msm
> driver.
>
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> ---
>  Documentation/gpu/msm-preemption.rst | 98 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 98 insertions(+)
>
> diff --git a/Documentation/gpu/msm-preemption.rst b/Documentation/gpu/msm-preemption.rst
> new file mode 100644
> index 000000000000..c1203524da2e
> --- /dev/null
> +++ b/Documentation/gpu/msm-preemption.rst
> @@ -0,0 +1,98 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:orphan:
> +
> +=============
> +MSM Preemtion
> +=============
> +
> +Preemption allows Adreno GPUs to switch to an higher priority ring when work is
> +pushed to it, reducing latency for high priority submissions.
> +
> +When preemption is enabled 4 rings are initialized, corresponding to different
> +priority levels. Having multiple rings is purely a software concept as the GPU
> +only has registers to keep track of one graphics ring.
> +The kernel is able to switch which ring is currently being processed by
> +requesting preemption. When certain conditions are met, depending on the
> +priority level, the GPU will save its current state in a series of buffers,
> +then restores state from a similar set of buffers specified by the kernel. It
> +then resumes execution and fires an IRQ to let the kernel know the context
> +switch has completed.
> +
> +This mechanism can be used by the kernel to switch between rings. Whenever a
> +submission occurs the kernel finds the highest priority ring which isn't empty
> +and preempts to it if said ring is not the one being currently executed. This is
> +also done whenever a submission completes to make sure execution resumes on a
> +lower priority ring when a higher priority ring is done.
> +
> +Preemption levels
> +-----------------
> +
> +Preemption can only occur at certain boundaries. The exact conditions can be
> +configured by changing the preemption level, this allows to compromise between
> +latency (ie. the time that passes between when the kernel requests preemption
> +and when the SQE begins saving state) and overhead (the amount of state that
> +needs to be saved).
> +
> +The GPU offers 3 levels:
> +
> +Level 0
> +  Preemption only occurs at the submission level. This requires the least amount
> +  of state to be saved as the execution of userspace submitted IBs is never
> +  interrupted, however it offers very little benefit compared to not enabling
> +  preemption of any kind.
> +
> +Level 1
> +  Preemption occurs at either bin level, if using GMEM rendering, or draw level
> +  in the sysmem rendering case.
> +
> +Level 2
> +  Preemption occurs at draw level.
> +
> +Level 1 is the mode that is used by the msm driver.
> +
> +Additionally the GPU allows to specify a `skip_save_restore` option. This
> +disables the saving and restoring of certain registers which lowers the
> +overhead. When enabling this userspace is expected to set the state that isn't
> +preserved whenever preemption occurs which is done by specifying preamble and
> +postambles. Those are IBs that are executed before and after
> +preemption.

This should mention that `skip_save_restore` only works with level 1
preemption and only if using GMEM rendering. Also, be a bit more
specific than "certain registers" - maybe something like "all
registers except those relating to the operation of the SQE itself."

> +
> +Preemption buffers
> +------------------
> +
> +A series of buffers are necessary to store the state of rings while they are not
> +being executed. There are different kinds of preemption records and most of
> +those require one buffer per ring. This is because preemption never occurs
> +between submissions on the same ring, which always run in sequence when the ring
> +is active. This means that only one context per ring is effectively active.
> +
> +SMMU_INFO
> +  This buffer contains info about the current SMMU configuration such as the
> +  ttbr0 register. The SQE firmware isn't actually able to save this record.
> +  As a result SMMU info must be saved manually from the CP to a buffer and the
> +  SMMU record updated with info from said buffer before triggering
> +  preemption.
> +
> +NON_SECURE
> +  This is the main preemption record where most state is saved. It is mostly
> +  opaque to the kernel except for the first few words that must be initialized
> +  by the kernel.
> +
> +SECURE
> +  This saves state related to the GPU's secure mode.
> +
> +NON_PRIV
> +  The intended purpose of this record is unknown. The SQE firmware actually
> +  ignores it and therefore msm doesn't handle it.
> +
> +COUNTER
> +  This record is used to save and restore performance counters.
> +
> +Handling the permissions of those buffers is critical for security. All but the
> +NON_PRIV records need to be inaccessible from userspace, so they must be mapped
> +in the kernel address space with the MSM_BO_MAP_PRIV flag.
> +For example, making the NON_SECURE record accessible from userspace would allow
> +any process to manipulate a saved ring's RPTR which can be used to skip the
> +execution of some packets in a ring and execute user commands with higher
> +privileges.
>
> --
> 2.46.0
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
                   ` (10 preceding siblings ...)
  2024-09-17 11:14 ` [PATCH v4 11/11] Documentation: document adreno preemption Antonino Maniscalco
@ 2024-09-18  7:46 ` Neil Armstrong
  2024-09-18 11:20   ` Antonino Maniscalco
                     ` (2 more replies)
  11 siblings, 3 replies; 35+ messages in thread
From: Neil Armstrong @ 2024-09-18  7:46 UTC (permalink / raw)
  To: Antonino Maniscalco, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Akhil P Oommen, Sharat Masetty

Hi,

On 17/09/2024 13:14, Antonino Maniscalco wrote:
> This series implements preemption for A7XX targets, which allows the GPU to
> switch to an higher priority ring when work is pushed to it, reducing latency
> for high priority submissions.
> 
> This series enables L1 preemption with skip_save_restore which requires
> the following userspace patches to function:
> 
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> 
> A flag is added to `msm_submitqueue_create` to only allow submissions
> from compatible userspace to be preempted, therefore maintaining
> compatibility.
> 
> Preemption is currently only enabled by default on A750, it can be
> enabled on other targets through the `enable_preemption` module
> parameter. This is because more testing is required on other targets.
> 
> For testing on other HW it is sufficient to set that parameter to a
> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> allows to run any application as high priority therefore preempting
> submissions from other applications.
> 
> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> added in this series can be used to observe preemption's behavior as
> well as measuring preemption latency.
> 
> Some commits from this series are based on a previous series to enable
> preemption on A6XX targets:
> 
> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> 
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> ---
> Changes in v4:
> - Added missing register in pwrup list
> - Removed and rearrange barriers
> - Renamed `skip_inline_wptr` to `restore_wptr`
> - Track ctx seqno per ring
> - Removed secure preempt context
> - NOP out postamble to disable it instantly
> - Only emit pwrup reglist once
> - Document bv_rptr_addr
> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> - Set name on preempt record buffer
> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> 
> Changes in v3:
> - Added documentation about preemption
> - Use quirks to determine which target supports preemption
> - Add a module parameter to force disabling or enabling preemption
> - Clear postamble when profiling
> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> - Make preemption records MAP_PRIV
> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>    anymore
> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> 
> Changes in v2:
> - Added preept_record_size for X185 in PATCH 3/7
> - Added patches to reset perf counters
> - Dropped unused defines
> - Dropped unused variable (fixes warning)
> - Only enable preemption on a750
> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> - Added Neil's Tested-By tags
> - Added explanation for UAPI changes in commit message
> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> 
> ---
> Antonino Maniscalco (11):
>        drm/msm: Fix bv_fence being used as bv_rptr
>        drm/msm/A6XX: Track current_ctx_seqno per ring
>        drm/msm: Add a `preempt_record_size` field
>        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>        drm/msm/A6xx: Implement preemption for A7XX targets
>        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>        drm/msm/A6xx: Use posamble to reset counters on preemption
>        drm/msm/A6xx: Add traces for preemption
>        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>        drm/msm/A6xx: Enable preemption for A750
>        Documentation: document adreno preemption
> 
>   Documentation/gpu/msm-preemption.rst               |  98 +++++
>   drivers/gpu/drm/msm/Makefile                       |   1 +
>   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>   include/uapi/drm/msm_drm.h                         |   5 +-
>   20 files changed, 1117 insertions(+), 66 deletions(-)
> ---
> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> change-id: 20240815-preemption-a750-t-fcee9a844b39
> 
> Best regards,

I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
and forced to 1, and I've seen no regression so far

On SM8550, I've seen a few:
platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
but it's unrelated to preempt

and on SM8450:
platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.

So you can also add:
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK

Thanks,
Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-18  7:46 ` [PATCH v4 00/11] Preemption support for A7XX Neil Armstrong
@ 2024-09-18 11:20   ` Antonino Maniscalco
  2024-09-18 15:39   ` Rob Clark
  2024-09-20 17:09   ` Akhil P Oommen
  2 siblings, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-18 11:20 UTC (permalink / raw)
  To: neil.armstrong, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel, linux-doc,
	Akhil P Oommen, Sharat Masetty

On 9/18/24 9:46 AM, Neil Armstrong wrote:
> Hi,
> 
> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>> This series implements preemption for A7XX targets, which allows the 
>> GPU to
>> switch to an higher priority ring when work is pushed to it, reducing 
>> latency
>> for high priority submissions.
>>
>> This series enables L1 preemption with skip_save_restore which requires
>> the following userspace patches to function:
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>
>> A flag is added to `msm_submitqueue_create` to only allow submissions
>> from compatible userspace to be preempted, therefore maintaining
>> compatibility.
>>
>> Preemption is currently only enabled by default on A750, it can be
>> enabled on other targets through the `enable_preemption` module
>> parameter. This is because more testing is required on other targets.
>>
>> For testing on other HW it is sufficient to set that parameter to a
>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>> allows to run any application as high priority therefore preempting
>> submissions from other applications.
>>
>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>> added in this series can be used to observe preemption's behavior as
>> well as measuring preemption latency.
>>
>> Some commits from this series are based on a previous series to enable
>> preemption on A6XX targets:
>>
>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>
>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>> ---
>> Changes in v4:
>> - Added missing register in pwrup list
>> - Removed and rearrange barriers
>> - Renamed `skip_inline_wptr` to `restore_wptr`
>> - Track ctx seqno per ring
>> - Removed secure preempt context
>> - NOP out postamble to disable it instantly
>> - Only emit pwrup reglist once
>> - Document bv_rptr_addr
>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>> - Set name on preempt record buffer
>> - Link to v3: 
>> https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>
>> Changes in v3:
>> - Added documentation about preemption
>> - Use quirks to determine which target supports preemption
>> - Add a module parameter to force disabling or enabling preemption
>> - Clear postamble when profiling
>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>> - Make preemption records MAP_PRIV
>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>    anymore
>> - Link to v2: 
>> https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>
>> Changes in v2:
>> - Added preept_record_size for X185 in PATCH 3/7
>> - Added patches to reset perf counters
>> - Dropped unused defines
>> - Dropped unused variable (fixes warning)
>> - Only enable preemption on a750
>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>> - Added Neil's Tested-By tags
>> - Added explanation for UAPI changes in commit message
>> - Link to v1: 
>> https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>
>> ---
>> Antonino Maniscalco (11):
>>        drm/msm: Fix bv_fence being used as bv_rptr
>>        drm/msm/A6XX: Track current_ctx_seqno per ring
>>        drm/msm: Add a `preempt_record_size` field
>>        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>        drm/msm/A6xx: Implement preemption for A7XX targets
>>        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>        drm/msm/A6xx: Use posamble to reset counters on preemption
>>        drm/msm/A6xx: Add traces for preemption
>>        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>        drm/msm/A6xx: Enable preemption for A750
>>        Documentation: document adreno preemption
>>
>>   Documentation/gpu/msm-preemption.rst               |  98 +++++
>>   drivers/gpu/drm/msm/Makefile                       |   1 +
>>   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 
>> ++++++++++++++-
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 
>> +++++++++++++++++++++
>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>   include/uapi/drm/msm_drm.h                         |   5 +-
>>   20 files changed, 1117 insertions(+), 66 deletions(-)
>> ---
>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>
>> Best regards,
> 
> I've been running vulkan-cts 
> (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in 
> default value
> and forced to 1, and I've seen no regression so far
> 
> On SM8550, I've seen a few:
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* 
> Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* 
> Unexpected message id 2743 on the response queue
> but it's unrelated to preempt
> 
> and on SM8450:
> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout 
> waiting for GMU OOB set GPU_SET: 0x0
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] 
> *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] 
> *ERROR* 7.3.0.1:     completed fence: 331235
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] 
> *ERROR* 7.3.0.1:     submitted fence: 331236
> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 
> 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 
> 0000000000000000/0000
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 
> 7.3.0.1: hangcheck recover!
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 
> 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 
> 7.3.0.1: hangcheck recover!
> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> 
> So you can also add:
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> 
> Thanks,
> Neil

Thanks for testing! Enabling for those targets then.

Best regards,
-- 
Antonino Maniscalco <antomani103@gmail.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-18  7:46 ` [PATCH v4 00/11] Preemption support for A7XX Neil Armstrong
  2024-09-18 11:20   ` Antonino Maniscalco
@ 2024-09-18 15:39   ` Rob Clark
  2024-09-20 16:14     ` Akhil P Oommen
  2024-09-20 17:09   ` Akhil P Oommen
  2 siblings, 1 reply; 35+ messages in thread
From: Rob Clark @ 2024-09-18 15:39 UTC (permalink / raw)
  To: neil.armstrong
  Cc: Antonino Maniscalco, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Akhil P Oommen, Sharat Masetty

On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
<neil.armstrong@linaro.org> wrote:
>
> Hi,
>
> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > This series implements preemption for A7XX targets, which allows the GPU to
> > switch to an higher priority ring when work is pushed to it, reducing latency
> > for high priority submissions.
> >
> > This series enables L1 preemption with skip_save_restore which requires
> > the following userspace patches to function:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> >
> > A flag is added to `msm_submitqueue_create` to only allow submissions
> > from compatible userspace to be preempted, therefore maintaining
> > compatibility.
> >
> > Preemption is currently only enabled by default on A750, it can be
> > enabled on other targets through the `enable_preemption` module
> > parameter. This is because more testing is required on other targets.
> >
> > For testing on other HW it is sufficient to set that parameter to a
> > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > allows to run any application as high priority therefore preempting
> > submissions from other applications.
> >
> > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > added in this series can be used to observe preemption's behavior as
> > well as measuring preemption latency.
> >
> > Some commits from this series are based on a previous series to enable
> > preemption on A6XX targets:
> >
> > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> >
> > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > ---
> > Changes in v4:
> > - Added missing register in pwrup list
> > - Removed and rearrange barriers
> > - Renamed `skip_inline_wptr` to `restore_wptr`
> > - Track ctx seqno per ring
> > - Removed secure preempt context
> > - NOP out postamble to disable it instantly
> > - Only emit pwrup reglist once
> > - Document bv_rptr_addr
> > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > - Set name on preempt record buffer
> > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> >
> > Changes in v3:
> > - Added documentation about preemption
> > - Use quirks to determine which target supports preemption
> > - Add a module parameter to force disabling or enabling preemption
> > - Clear postamble when profiling
> > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > - Make preemption records MAP_PRIV
> > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >    anymore
> > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> >
> > Changes in v2:
> > - Added preept_record_size for X185 in PATCH 3/7
> > - Added patches to reset perf counters
> > - Dropped unused defines
> > - Dropped unused variable (fixes warning)
> > - Only enable preemption on a750
> > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > - Added Neil's Tested-By tags
> > - Added explanation for UAPI changes in commit message
> > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> >
> > ---
> > Antonino Maniscalco (11):
> >        drm/msm: Fix bv_fence being used as bv_rptr
> >        drm/msm/A6XX: Track current_ctx_seqno per ring
> >        drm/msm: Add a `preempt_record_size` field
> >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >        drm/msm/A6xx: Implement preemption for A7XX targets
> >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >        drm/msm/A6xx: Use posamble to reset counters on preemption
> >        drm/msm/A6xx: Add traces for preemption
> >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >        drm/msm/A6xx: Enable preemption for A750
> >        Documentation: document adreno preemption
> >
> >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> >   drivers/gpu/drm/msm/Makefile                       |   1 +
> >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> >   include/uapi/drm/msm_drm.h                         |   5 +-
> >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > ---
> > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > change-id: 20240815-preemption-a750-t-fcee9a844b39
> >
> > Best regards,
>
> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> and forced to 1, and I've seen no regression so far
>
> On SM8550, I've seen a few:
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> but it's unrelated to preempt
>
> and on SM8450:
> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.

I suspect on newer devices we have trouble resetting the GMU, leading
to (what I assume is happening here) the CPU thinking the GMU is in a
different state than it is.

Which has led to some stability issues on a660 in mesa CI, if anything
crashes the gpu in the CI run it tends to kill the rest of the run
until the board is power cycled.

https://gitlab.freedesktop.org/drm/msm/-/issues/37

I think we have some work to do on making recovery more robust on
things newer than early a6xx things.

BR,
-R

> So you can also add:
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>
> Thanks,
> Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-18 15:39   ` Rob Clark
@ 2024-09-20 16:14     ` Akhil P Oommen
  2024-09-20 17:17       ` Rob Clark
  2024-10-22 15:05       ` Rob Clark
  0 siblings, 2 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 16:14 UTC (permalink / raw)
  To: Rob Clark, g
  Cc: neil.armstrong, Antonino Maniscalco, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> <neil.armstrong@linaro.org> wrote:
> >
> > Hi,
> >
> > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > This series implements preemption for A7XX targets, which allows the GPU to
> > > switch to an higher priority ring when work is pushed to it, reducing latency
> > > for high priority submissions.
> > >
> > > This series enables L1 preemption with skip_save_restore which requires
> > > the following userspace patches to function:
> > >
> > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > >
> > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > from compatible userspace to be preempted, therefore maintaining
> > > compatibility.
> > >
> > > Preemption is currently only enabled by default on A750, it can be
> > > enabled on other targets through the `enable_preemption` module
> > > parameter. This is because more testing is required on other targets.
> > >
> > > For testing on other HW it is sufficient to set that parameter to a
> > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > allows to run any application as high priority therefore preempting
> > > submissions from other applications.
> > >
> > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > added in this series can be used to observe preemption's behavior as
> > > well as measuring preemption latency.
> > >
> > > Some commits from this series are based on a previous series to enable
> > > preemption on A6XX targets:
> > >
> > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > >
> > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > ---
> > > Changes in v4:
> > > - Added missing register in pwrup list
> > > - Removed and rearrange barriers
> > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > - Track ctx seqno per ring
> > > - Removed secure preempt context
> > > - NOP out postamble to disable it instantly
> > > - Only emit pwrup reglist once
> > > - Document bv_rptr_addr
> > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > - Set name on preempt record buffer
> > > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > >
> > > Changes in v3:
> > > - Added documentation about preemption
> > > - Use quirks to determine which target supports preemption
> > > - Add a module parameter to force disabling or enabling preemption
> > > - Clear postamble when profiling
> > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > - Make preemption records MAP_PRIV
> > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > >    anymore
> > > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > >
> > > Changes in v2:
> > > - Added preept_record_size for X185 in PATCH 3/7
> > > - Added patches to reset perf counters
> > > - Dropped unused defines
> > > - Dropped unused variable (fixes warning)
> > > - Only enable preemption on a750
> > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > - Added Neil's Tested-By tags
> > > - Added explanation for UAPI changes in commit message
> > > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > >
> > > ---
> > > Antonino Maniscalco (11):
> > >        drm/msm: Fix bv_fence being used as bv_rptr
> > >        drm/msm/A6XX: Track current_ctx_seqno per ring
> > >        drm/msm: Add a `preempt_record_size` field
> > >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > >        drm/msm/A6xx: Implement preemption for A7XX targets
> > >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > >        drm/msm/A6xx: Use posamble to reset counters on preemption
> > >        drm/msm/A6xx: Add traces for preemption
> > >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > >        drm/msm/A6xx: Enable preemption for A750
> > >        Documentation: document adreno preemption
> > >
> > >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> > >   drivers/gpu/drm/msm/Makefile                       |   1 +
> > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > >   include/uapi/drm/msm_drm.h                         |   5 +-
> > >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > > ---
> > > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > >
> > > Best regards,
> >
> > I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > and forced to 1, and I've seen no regression so far
> >
> > On SM8550, I've seen a few:
> > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > but it's unrelated to preempt
> >
> > and on SM8450:
> > platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> 
> I suspect on newer devices we have trouble resetting the GMU, leading
> to (what I assume is happening here) the CPU thinking the GMU is in a
> different state than it is.
> 
> Which has led to some stability issues on a660 in mesa CI, if anything
> crashes the gpu in the CI run it tends to kill the rest of the run
> until the board is power cycled.
> 
> https://gitlab.freedesktop.org/drm/msm/-/issues/37
> 
> I think we have some work to do on making recovery more robust on
> things newer than early a6xx things.

Is this seen only with a particular scenario or is recovery always
broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
not sure what exactly regressed. At least I didn't see any issue on
x185.

-Akhil.

> 
> BR,
> -R
> 
> > So you can also add:
> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> >
> > Thanks,
> > Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-20 16:14     ` Akhil P Oommen
@ 2024-09-20 17:17       ` Rob Clark
  2024-10-22 15:05       ` Rob Clark
  1 sibling, 0 replies; 35+ messages in thread
From: Rob Clark @ 2024-09-20 17:17 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: g, neil.armstrong, Antonino Maniscalco, Sean Paul, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Sharat Masetty, Konrad Dybcio

On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> > On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> > <neil.armstrong@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > > This series implements preemption for A7XX targets, which allows the GPU to
> > > > switch to an higher priority ring when work is pushed to it, reducing latency
> > > > for high priority submissions.
> > > >
> > > > This series enables L1 preemption with skip_save_restore which requires
> > > > the following userspace patches to function:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > > >
> > > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > > from compatible userspace to be preempted, therefore maintaining
> > > > compatibility.
> > > >
> > > > Preemption is currently only enabled by default on A750, it can be
> > > > enabled on other targets through the `enable_preemption` module
> > > > parameter. This is because more testing is required on other targets.
> > > >
> > > > For testing on other HW it is sufficient to set that parameter to a
> > > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > > allows to run any application as high priority therefore preempting
> > > > submissions from other applications.
> > > >
> > > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > > added in this series can be used to observe preemption's behavior as
> > > > well as measuring preemption latency.
> > > >
> > > > Some commits from this series are based on a previous series to enable
> > > > preemption on A6XX targets:
> > > >
> > > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > > >
> > > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > > ---
> > > > Changes in v4:
> > > > - Added missing register in pwrup list
> > > > - Removed and rearrange barriers
> > > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > > - Track ctx seqno per ring
> > > > - Removed secure preempt context
> > > > - NOP out postamble to disable it instantly
> > > > - Only emit pwrup reglist once
> > > > - Document bv_rptr_addr
> > > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > > - Set name on preempt record buffer
> > > > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > > >
> > > > Changes in v3:
> > > > - Added documentation about preemption
> > > > - Use quirks to determine which target supports preemption
> > > > - Add a module parameter to force disabling or enabling preemption
> > > > - Clear postamble when profiling
> > > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > > - Make preemption records MAP_PRIV
> > > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > > >    anymore
> > > > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > > >
> > > > Changes in v2:
> > > > - Added preept_record_size for X185 in PATCH 3/7
> > > > - Added patches to reset perf counters
> > > > - Dropped unused defines
> > > > - Dropped unused variable (fixes warning)
> > > > - Only enable preemption on a750
> > > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > > - Added Neil's Tested-By tags
> > > > - Added explanation for UAPI changes in commit message
> > > > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > > >
> > > > ---
> > > > Antonino Maniscalco (11):
> > > >        drm/msm: Fix bv_fence being used as bv_rptr
> > > >        drm/msm/A6XX: Track current_ctx_seqno per ring
> > > >        drm/msm: Add a `preempt_record_size` field
> > > >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > > >        drm/msm/A6xx: Implement preemption for A7XX targets
> > > >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > > >        drm/msm/A6xx: Use posamble to reset counters on preemption
> > > >        drm/msm/A6xx: Add traces for preemption
> > > >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > > >        drm/msm/A6xx: Enable preemption for A750
> > > >        Documentation: document adreno preemption
> > > >
> > > >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> > > >   drivers/gpu/drm/msm/Makefile                       |   1 +
> > > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > > >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > > >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > > >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > > >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > > >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > > >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > > >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > > >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > > >   include/uapi/drm/msm_drm.h                         |   5 +-
> > > >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > > > ---
> > > > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > > >
> > > > Best regards,
> > >
> > > I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > > on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > > and forced to 1, and I've seen no regression so far
> > >
> > > On SM8550, I've seen a few:
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > > but it's unrelated to preempt
> > >
> > > and on SM8450:
> > > platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > > adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> >
> > I suspect on newer devices we have trouble resetting the GMU, leading
> > to (what I assume is happening here) the CPU thinking the GMU is in a
> > different state than it is.
> >
> > Which has led to some stability issues on a660 in mesa CI, if anything
> > crashes the gpu in the CI run it tends to kill the rest of the run
> > until the board is power cycled.
> >
> > https://gitlab.freedesktop.org/drm/msm/-/issues/37
> >
> > I think we have some work to do on making recovery more robust on
> > things newer than early a6xx things.
>
> Is this seen only with a particular scenario or is recovery always
> broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
> not sure what exactly regressed. At least I didn't see any issue on
> x185.

It might only be certain scenarios.. There is an igt recovery test, I
guess I can try that on a690 next week.  Maybe we should see if we
could get the a660 boards into drm-ci, which would include the
msm_recovery test

BR,
-R

BR,
-R

> -Akhil.
>
> >
> > BR,
> > -R
> >
> > > So you can also add:
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> > >
> > > Thanks,
> > > Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-20 16:14     ` Akhil P Oommen
  2024-09-20 17:17       ` Rob Clark
@ 2024-10-22 15:05       ` Rob Clark
  2024-10-23 16:21         ` Akhil P Oommen
  1 sibling, 1 reply; 35+ messages in thread
From: Rob Clark @ 2024-10-22 15:05 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: g, neil.armstrong, Antonino Maniscalco, Sean Paul, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Sharat Masetty, Konrad Dybcio

On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> > On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> > <neil.armstrong@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > > This series implements preemption for A7XX targets, which allows the GPU to
> > > > switch to an higher priority ring when work is pushed to it, reducing latency
> > > > for high priority submissions.
> > > >
> > > > This series enables L1 preemption with skip_save_restore which requires
> > > > the following userspace patches to function:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > > >
> > > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > > from compatible userspace to be preempted, therefore maintaining
> > > > compatibility.
> > > >
> > > > Preemption is currently only enabled by default on A750, it can be
> > > > enabled on other targets through the `enable_preemption` module
> > > > parameter. This is because more testing is required on other targets.
> > > >
> > > > For testing on other HW it is sufficient to set that parameter to a
> > > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > > allows to run any application as high priority therefore preempting
> > > > submissions from other applications.
> > > >
> > > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > > added in this series can be used to observe preemption's behavior as
> > > > well as measuring preemption latency.
> > > >
> > > > Some commits from this series are based on a previous series to enable
> > > > preemption on A6XX targets:
> > > >
> > > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > > >
> > > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > > ---
> > > > Changes in v4:
> > > > - Added missing register in pwrup list
> > > > - Removed and rearrange barriers
> > > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > > - Track ctx seqno per ring
> > > > - Removed secure preempt context
> > > > - NOP out postamble to disable it instantly
> > > > - Only emit pwrup reglist once
> > > > - Document bv_rptr_addr
> > > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > > - Set name on preempt record buffer
> > > > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > > >
> > > > Changes in v3:
> > > > - Added documentation about preemption
> > > > - Use quirks to determine which target supports preemption
> > > > - Add a module parameter to force disabling or enabling preemption
> > > > - Clear postamble when profiling
> > > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > > - Make preemption records MAP_PRIV
> > > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > > >    anymore
> > > > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > > >
> > > > Changes in v2:
> > > > - Added preept_record_size for X185 in PATCH 3/7
> > > > - Added patches to reset perf counters
> > > > - Dropped unused defines
> > > > - Dropped unused variable (fixes warning)
> > > > - Only enable preemption on a750
> > > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > > - Added Neil's Tested-By tags
> > > > - Added explanation for UAPI changes in commit message
> > > > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > > >
> > > > ---
> > > > Antonino Maniscalco (11):
> > > >        drm/msm: Fix bv_fence being used as bv_rptr
> > > >        drm/msm/A6XX: Track current_ctx_seqno per ring
> > > >        drm/msm: Add a `preempt_record_size` field
> > > >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > > >        drm/msm/A6xx: Implement preemption for A7XX targets
> > > >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > > >        drm/msm/A6xx: Use posamble to reset counters on preemption
> > > >        drm/msm/A6xx: Add traces for preemption
> > > >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > > >        drm/msm/A6xx: Enable preemption for A750
> > > >        Documentation: document adreno preemption
> > > >
> > > >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> > > >   drivers/gpu/drm/msm/Makefile                       |   1 +
> > > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > > >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > > >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > > >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > > >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > > >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > > >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > > >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > > >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > > >   include/uapi/drm/msm_drm.h                         |   5 +-
> > > >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > > > ---
> > > > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > > >
> > > > Best regards,
> > >
> > > I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > > on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > > and forced to 1, and I've seen no regression so far
> > >
> > > On SM8550, I've seen a few:
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > > but it's unrelated to preempt
> > >
> > > and on SM8450:
> > > platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > > adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> >
> > I suspect on newer devices we have trouble resetting the GMU, leading
> > to (what I assume is happening here) the CPU thinking the GMU is in a
> > different state than it is.
> >
> > Which has led to some stability issues on a660 in mesa CI, if anything
> > crashes the gpu in the CI run it tends to kill the rest of the run
> > until the board is power cycled.
> >
> > https://gitlab.freedesktop.org/drm/msm/-/issues/37
> >
> > I think we have some work to do on making recovery more robust on
> > things newer than early a6xx things.
>
> Is this seen only with a particular scenario or is recovery always
> broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
> not sure what exactly regressed. At least I didn't see any issue on
> x185.

More recently my x1e (x1-85) and sc8280xp (a690) have been pretty
reliable about recovery.  And mesa CI seems to have gotten more
reliable at recovery when they uprev'd from v6.6x to v6.11.x, so I
guess something in that range improved things?  But maybe not 100%,
kernel-ci (msm/msm_recovery@gpu-fault) can sometimes reproduce this,
apparently:

https://gitlab.freedesktop.org/drm/msm/-/issues/65

This test does 16 submits, with the 10th one having an invalid opc,
and then checks that all the ones before and after successfully
execute a CP_MEM_WRITE:

https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/msm/msm_recovery.c?ref_type=heads#L145

BR,
-R

> -Akhil.
>
> >
> > BR,
> > -R
> >
> > > So you can also add:
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> > >
> > > Thanks,
> > > Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-10-22 15:05       ` Rob Clark
@ 2024-10-23 16:21         ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-10-23 16:21 UTC (permalink / raw)
  To: Rob Clark
  Cc: g, neil.armstrong, Antonino Maniscalco, Sean Paul, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Sharat Masetty, Konrad Dybcio

On 10/22/2024 8:35 PM, Rob Clark wrote:
> On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>
>> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
>>> On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
>>> <neil.armstrong@linaro.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>>>> This series implements preemption for A7XX targets, which allows the GPU to
>>>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>>>> for high priority submissions.
>>>>>
>>>>> This series enables L1 preemption with skip_save_restore which requires
>>>>> the following userspace patches to function:
>>>>>
>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>>>
>>>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>>>> from compatible userspace to be preempted, therefore maintaining
>>>>> compatibility.
>>>>>
>>>>> Preemption is currently only enabled by default on A750, it can be
>>>>> enabled on other targets through the `enable_preemption` module
>>>>> parameter. This is because more testing is required on other targets.
>>>>>
>>>>> For testing on other HW it is sufficient to set that parameter to a
>>>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>>>> allows to run any application as high priority therefore preempting
>>>>> submissions from other applications.
>>>>>
>>>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>>>> added in this series can be used to observe preemption's behavior as
>>>>> well as measuring preemption latency.
>>>>>
>>>>> Some commits from this series are based on a previous series to enable
>>>>> preemption on A6XX targets:
>>>>>
>>>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>>>
>>>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>>>> ---
>>>>> Changes in v4:
>>>>> - Added missing register in pwrup list
>>>>> - Removed and rearrange barriers
>>>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>>>> - Track ctx seqno per ring
>>>>> - Removed secure preempt context
>>>>> - NOP out postamble to disable it instantly
>>>>> - Only emit pwrup reglist once
>>>>> - Document bv_rptr_addr
>>>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>>>> - Set name on preempt record buffer
>>>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>>>
>>>>> Changes in v3:
>>>>> - Added documentation about preemption
>>>>> - Use quirks to determine which target supports preemption
>>>>> - Add a module parameter to force disabling or enabling preemption
>>>>> - Clear postamble when profiling
>>>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>>>> - Make preemption records MAP_PRIV
>>>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>>>    anymore
>>>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>>>
>>>>> Changes in v2:
>>>>> - Added preept_record_size for X185 in PATCH 3/7
>>>>> - Added patches to reset perf counters
>>>>> - Dropped unused defines
>>>>> - Dropped unused variable (fixes warning)
>>>>> - Only enable preemption on a750
>>>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>>>> - Added Neil's Tested-By tags
>>>>> - Added explanation for UAPI changes in commit message
>>>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>>>
>>>>> ---
>>>>> Antonino Maniscalco (11):
>>>>>        drm/msm: Fix bv_fence being used as bv_rptr
>>>>>        drm/msm/A6XX: Track current_ctx_seqno per ring
>>>>>        drm/msm: Add a `preempt_record_size` field
>>>>>        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>>>        drm/msm/A6xx: Implement preemption for A7XX targets
>>>>>        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>>>        drm/msm/A6xx: Use posamble to reset counters on preemption
>>>>>        drm/msm/A6xx: Add traces for preemption
>>>>>        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>>>        drm/msm/A6xx: Enable preemption for A750
>>>>>        Documentation: document adreno preemption
>>>>>
>>>>>   Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>>>   drivers/gpu/drm/msm/Makefile                       |   1 +
>>>>>   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>>>   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>>>   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>>>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>>>   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>>>   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>>>   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>>>   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>>>   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>>>   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>>>   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>>>   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>>>   include/uapi/drm/msm_drm.h                         |   5 +-
>>>>>   20 files changed, 1117 insertions(+), 66 deletions(-)
>>>>> ---
>>>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>>>
>>>>> Best regards,
>>>>
>>>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>>>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>>>> and forced to 1, and I've seen no regression so far
>>>>
>>>> On SM8550, I've seen a few:
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>>>> but it's unrelated to preempt
>>>>
>>>> and on SM8450:
>>>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>>>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>>
>>> I suspect on newer devices we have trouble resetting the GMU, leading
>>> to (what I assume is happening here) the CPU thinking the GMU is in a
>>> different state than it is.
>>>
>>> Which has led to some stability issues on a660 in mesa CI, if anything
>>> crashes the gpu in the CI run it tends to kill the rest of the run
>>> until the board is power cycled.
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/issues/37
>>>
>>> I think we have some work to do on making recovery more robust on
>>> things newer than early a6xx things.
>>
>> Is this seen only with a particular scenario or is recovery always
>> broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
>> not sure what exactly regressed. At least I didn't see any issue on
>> x185.
> 
> More recently my x1e (x1-85) and sc8280xp (a690) have been pretty
> reliable about recovery.  And mesa CI seems to have gotten more
> reliable at recovery when they uprev'd from v6.6x to v6.11.x, so I
> guess something in that range improved things?  But maybe not 100%,
> kernel-ci (msm/msm_recovery@gpu-fault) can sometimes reproduce this,
> apparently:
> 
> https://gitlab.freedesktop.org/drm/msm/-/issues/65
> 
> This test does 16 submits, with the 10th one having an invalid opc,
> and then checks that all the ones before and after successfully
> execute a CP_MEM_WRITE:
> 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/msm/msm_recovery.c?ref_type=heads#L145
>

I suppose we don't have a gpu coredump available. A663 is pretty similar to
A660, so I can try to reproduce this issue there. Will check this out.

-Akhil

> BR,
> -R
> 
>> -Akhil.
>>
>>>
>>> BR,
>>> -R
>>>
>>>> So you can also add:
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>>>
>>>> Thanks,
>>>> Neil


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-18  7:46 ` [PATCH v4 00/11] Preemption support for A7XX Neil Armstrong
  2024-09-18 11:20   ` Antonino Maniscalco
  2024-09-18 15:39   ` Rob Clark
@ 2024-09-20 17:09   ` Akhil P Oommen
  2024-09-21 16:45     ` Neil Armstrong
  2024-09-24 11:53     ` Antonino Maniscalco
  2 siblings, 2 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-20 17:09 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: Antonino Maniscalco, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> Hi,
> 
> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > This series implements preemption for A7XX targets, which allows the GPU to
> > switch to an higher priority ring when work is pushed to it, reducing latency
> > for high priority submissions.
> > 
> > This series enables L1 preemption with skip_save_restore which requires
> > the following userspace patches to function:
> > 
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > 
> > A flag is added to `msm_submitqueue_create` to only allow submissions
> > from compatible userspace to be preempted, therefore maintaining
> > compatibility.
> > 
> > Preemption is currently only enabled by default on A750, it can be
> > enabled on other targets through the `enable_preemption` module
> > parameter. This is because more testing is required on other targets.
> > 
> > For testing on other HW it is sufficient to set that parameter to a
> > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > allows to run any application as high priority therefore preempting
> > submissions from other applications.
> > 
> > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > added in this series can be used to observe preemption's behavior as
> > well as measuring preemption latency.
> > 
> > Some commits from this series are based on a previous series to enable
> > preemption on A6XX targets:
> > 
> > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > 
> > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > ---
> > Changes in v4:
> > - Added missing register in pwrup list
> > - Removed and rearrange barriers
> > - Renamed `skip_inline_wptr` to `restore_wptr`
> > - Track ctx seqno per ring
> > - Removed secure preempt context
> > - NOP out postamble to disable it instantly
> > - Only emit pwrup reglist once
> > - Document bv_rptr_addr
> > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > - Set name on preempt record buffer
> > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > 
> > Changes in v3:
> > - Added documentation about preemption
> > - Use quirks to determine which target supports preemption
> > - Add a module parameter to force disabling or enabling preemption
> > - Clear postamble when profiling
> > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > - Make preemption records MAP_PRIV
> > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >    anymore
> > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > 
> > Changes in v2:
> > - Added preept_record_size for X185 in PATCH 3/7
> > - Added patches to reset perf counters
> > - Dropped unused defines
> > - Dropped unused variable (fixes warning)
> > - Only enable preemption on a750
> > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > - Added Neil's Tested-By tags
> > - Added explanation for UAPI changes in commit message
> > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > 
> > ---
> > Antonino Maniscalco (11):
> >        drm/msm: Fix bv_fence being used as bv_rptr
> >        drm/msm/A6XX: Track current_ctx_seqno per ring
> >        drm/msm: Add a `preempt_record_size` field
> >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >        drm/msm/A6xx: Implement preemption for A7XX targets
> >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >        drm/msm/A6xx: Use posamble to reset counters on preemption
> >        drm/msm/A6xx: Add traces for preemption
> >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >        drm/msm/A6xx: Enable preemption for A750
> >        Documentation: document adreno preemption
> > 
> >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> >   drivers/gpu/drm/msm/Makefile                       |   1 +
> >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> >   include/uapi/drm/msm_drm.h                         |   5 +-
> >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > ---
> > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > 
> > Best regards,
> 
> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> and forced to 1, and I've seen no regression so far
> 
> On SM8550, I've seen a few:
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> but it's unrelated to preempt
> 
> and on SM8450:
> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> 
> So you can also add:
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> 

Niel,

On my x1e device, all submissions were somehow going into only a single
ring, even the compositor's. Not sure why. So effectively preemption was
not really exercised. I had to force one of the two benchmark I ran
using the "highprio" mesa debug flag force submittions to ring 0.

If possible it is a good idea to check the new preemption traces to
ensure preemption kicks in.

-Akhil

> Thanks,
> Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-20 17:09   ` Akhil P Oommen
@ 2024-09-21 16:45     ` Neil Armstrong
  2024-09-24 11:53     ` Antonino Maniscalco
  1 sibling, 0 replies; 35+ messages in thread
From: Neil Armstrong @ 2024-09-21 16:45 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Antonino Maniscalco, Rob Clark, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

Le 20/09/2024 à 19:09, Akhil P Oommen a écrit :
> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>> Hi,
>>
>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>> This series implements preemption for A7XX targets, which allows the GPU to
>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>> for high priority submissions.
>>>
>>> This series enables L1 preemption with skip_save_restore which requires
>>> the following userspace patches to function:
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>
>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>> from compatible userspace to be preempted, therefore maintaining
>>> compatibility.
>>>
>>> Preemption is currently only enabled by default on A750, it can be
>>> enabled on other targets through the `enable_preemption` module
>>> parameter. This is because more testing is required on other targets.
>>>
>>> For testing on other HW it is sufficient to set that parameter to a
>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>> allows to run any application as high priority therefore preempting
>>> submissions from other applications.
>>>
>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>> added in this series can be used to observe preemption's behavior as
>>> well as measuring preemption latency.
>>>
>>> Some commits from this series are based on a previous series to enable
>>> preemption on A6XX targets:
>>>
>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>
>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>> ---
>>> Changes in v4:
>>> - Added missing register in pwrup list
>>> - Removed and rearrange barriers
>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>> - Track ctx seqno per ring
>>> - Removed secure preempt context
>>> - NOP out postamble to disable it instantly
>>> - Only emit pwrup reglist once
>>> - Document bv_rptr_addr
>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>> - Set name on preempt record buffer
>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>
>>> Changes in v3:
>>> - Added documentation about preemption
>>> - Use quirks to determine which target supports preemption
>>> - Add a module parameter to force disabling or enabling preemption
>>> - Clear postamble when profiling
>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>> - Make preemption records MAP_PRIV
>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>     anymore
>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>
>>> Changes in v2:
>>> - Added preept_record_size for X185 in PATCH 3/7
>>> - Added patches to reset perf counters
>>> - Dropped unused defines
>>> - Dropped unused variable (fixes warning)
>>> - Only enable preemption on a750
>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>> - Added Neil's Tested-By tags
>>> - Added explanation for UAPI changes in commit message
>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>
>>> ---
>>> Antonino Maniscalco (11):
>>>         drm/msm: Fix bv_fence being used as bv_rptr
>>>         drm/msm/A6XX: Track current_ctx_seqno per ring
>>>         drm/msm: Add a `preempt_record_size` field
>>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>         drm/msm/A6xx: Implement preemption for A7XX targets
>>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>         drm/msm/A6xx: Use posamble to reset counters on preemption
>>>         drm/msm/A6xx: Add traces for preemption
>>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>         drm/msm/A6xx: Enable preemption for A750
>>>         Documentation: document adreno preemption
>>>
>>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>    drivers/gpu/drm/msm/Makefile                       |   1 +
>>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>    include/uapi/drm/msm_drm.h                         |   5 +-
>>>    20 files changed, 1117 insertions(+), 66 deletions(-)
>>> ---
>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>
>>> Best regards,
>>
>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>> and forced to 1, and I've seen no regression so far
>>
>> On SM8550, I've seen a few:
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>> but it's unrelated to preempt
>>
>> and on SM8450:
>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>
>> So you can also add:
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>
> 
> Niel,
> 
> On my x1e device, all submissions were somehow going into only a single
> ring, even the compositor's. Not sure why. So effectively preemption was
> not really exercised. I had to force one of the two benchmark I ran
> using the "highprio" mesa debug flag force submittions to ring 0.
> 
> If possible it is a good idea to check the new preemption traces to
> ensure preemption kicks in.

Sure I'll run the test again on a750 and check if preemption kicks in.

Neil

> 
> -Akhil
> 
>> Thanks,
>> Neil


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-20 17:09   ` Akhil P Oommen
  2024-09-21 16:45     ` Neil Armstrong
@ 2024-09-24 11:53     ` Antonino Maniscalco
  2024-09-24 14:47       ` Rob Clark
  1 sibling, 1 reply; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-24 11:53 UTC (permalink / raw)
  To: Akhil P Oommen, Neil Armstrong
  Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Sharat Masetty

On 9/20/24 7:09 PM, Akhil P Oommen wrote:
> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>> Hi,
>>
>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>> This series implements preemption for A7XX targets, which allows the GPU to
>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>> for high priority submissions.
>>>
>>> This series enables L1 preemption with skip_save_restore which requires
>>> the following userspace patches to function:
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>
>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>> from compatible userspace to be preempted, therefore maintaining
>>> compatibility.
>>>
>>> Preemption is currently only enabled by default on A750, it can be
>>> enabled on other targets through the `enable_preemption` module
>>> parameter. This is because more testing is required on other targets.
>>>
>>> For testing on other HW it is sufficient to set that parameter to a
>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>> allows to run any application as high priority therefore preempting
>>> submissions from other applications.
>>>
>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>> added in this series can be used to observe preemption's behavior as
>>> well as measuring preemption latency.
>>>
>>> Some commits from this series are based on a previous series to enable
>>> preemption on A6XX targets:
>>>
>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>
>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>> ---
>>> Changes in v4:
>>> - Added missing register in pwrup list
>>> - Removed and rearrange barriers
>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>> - Track ctx seqno per ring
>>> - Removed secure preempt context
>>> - NOP out postamble to disable it instantly
>>> - Only emit pwrup reglist once
>>> - Document bv_rptr_addr
>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>> - Set name on preempt record buffer
>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>
>>> Changes in v3:
>>> - Added documentation about preemption
>>> - Use quirks to determine which target supports preemption
>>> - Add a module parameter to force disabling or enabling preemption
>>> - Clear postamble when profiling
>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>> - Make preemption records MAP_PRIV
>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>     anymore
>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>
>>> Changes in v2:
>>> - Added preept_record_size for X185 in PATCH 3/7
>>> - Added patches to reset perf counters
>>> - Dropped unused defines
>>> - Dropped unused variable (fixes warning)
>>> - Only enable preemption on a750
>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>> - Added Neil's Tested-By tags
>>> - Added explanation for UAPI changes in commit message
>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>
>>> ---
>>> Antonino Maniscalco (11):
>>>         drm/msm: Fix bv_fence being used as bv_rptr
>>>         drm/msm/A6XX: Track current_ctx_seqno per ring
>>>         drm/msm: Add a `preempt_record_size` field
>>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>         drm/msm/A6xx: Implement preemption for A7XX targets
>>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>         drm/msm/A6xx: Use posamble to reset counters on preemption
>>>         drm/msm/A6xx: Add traces for preemption
>>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>         drm/msm/A6xx: Enable preemption for A750
>>>         Documentation: document adreno preemption
>>>
>>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>    drivers/gpu/drm/msm/Makefile                       |   1 +
>>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>    include/uapi/drm/msm_drm.h                         |   5 +-
>>>    20 files changed, 1117 insertions(+), 66 deletions(-)
>>> ---
>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>
>>> Best regards,
>>
>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>> and forced to 1, and I've seen no regression so far
>>
>> On SM8550, I've seen a few:
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>> but it's unrelated to preempt
>>
>> and on SM8450:
>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>
>> So you can also add:
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>
> 
> Niel,
> 
> On my x1e device, all submissions were somehow going into only a single
> ring, even the compositor's. Not sure why. So effectively preemption was
> not really exercised. I had to force one of the two benchmark I ran
> using the "highprio" mesa debug flag force submittions to ring 0.

I think that is because GL applications (so most compositors) run 
through zink which does not forward GL preemption to vulkan so yeah, for 
GL applications the only way of getting preemption is the debug flag.

Unfortunately this is not easy to fix in Zink because it creates one 
VkDevice at screen creation and uses it for all GL contexts. Since GL 
priority is provided per context and at context creation time Zink has 
no way of handling this.

Once TU will support more than one queue it will be possible for Zink to 
create one queue per priority then pick one at context creation time. 
Doing so would require a new vulkan extension for per queue global 
priority. I had started working on this some time ago 
https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
but this solution will only be viable once TU can expose more than one 
queue.

> 
> If possible it is a good idea to check the new preemption traces to
> ensure preemption kicks in.
> 
> -Akhil
> 
>> Thanks,
>> Neil


Best regards,
-- 
Antonino Maniscalco <antomani103@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-24 11:53     ` Antonino Maniscalco
@ 2024-09-24 14:47       ` Rob Clark
  2024-09-24 15:22         ` Akhil P Oommen
  2024-09-24 17:45         ` Antonino Maniscalco
  0 siblings, 2 replies; 35+ messages in thread
From: Rob Clark @ 2024-09-24 14:47 UTC (permalink / raw)
  To: Antonino Maniscalco
  Cc: Akhil P Oommen, Neil Armstrong, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
<antomani103@gmail.com> wrote:
>
> On 9/20/24 7:09 PM, Akhil P Oommen wrote:
> > On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> >> Hi,
> >>
> >> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> >>> This series implements preemption for A7XX targets, which allows the GPU to
> >>> switch to an higher priority ring when work is pushed to it, reducing latency
> >>> for high priority submissions.
> >>>
> >>> This series enables L1 preemption with skip_save_restore which requires
> >>> the following userspace patches to function:
> >>>
> >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> >>>
> >>> A flag is added to `msm_submitqueue_create` to only allow submissions
> >>> from compatible userspace to be preempted, therefore maintaining
> >>> compatibility.
> >>>
> >>> Preemption is currently only enabled by default on A750, it can be
> >>> enabled on other targets through the `enable_preemption` module
> >>> parameter. This is because more testing is required on other targets.
> >>>
> >>> For testing on other HW it is sufficient to set that parameter to a
> >>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> >>> allows to run any application as high priority therefore preempting
> >>> submissions from other applications.
> >>>
> >>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> >>> added in this series can be used to observe preemption's behavior as
> >>> well as measuring preemption latency.
> >>>
> >>> Some commits from this series are based on a previous series to enable
> >>> preemption on A6XX targets:
> >>>
> >>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> >>>
> >>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> >>> ---
> >>> Changes in v4:
> >>> - Added missing register in pwrup list
> >>> - Removed and rearrange barriers
> >>> - Renamed `skip_inline_wptr` to `restore_wptr`
> >>> - Track ctx seqno per ring
> >>> - Removed secure preempt context
> >>> - NOP out postamble to disable it instantly
> >>> - Only emit pwrup reglist once
> >>> - Document bv_rptr_addr
> >>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> >>> - Set name on preempt record buffer
> >>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> >>>
> >>> Changes in v3:
> >>> - Added documentation about preemption
> >>> - Use quirks to determine which target supports preemption
> >>> - Add a module parameter to force disabling or enabling preemption
> >>> - Clear postamble when profiling
> >>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> >>> - Make preemption records MAP_PRIV
> >>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >>>     anymore
> >>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> >>>
> >>> Changes in v2:
> >>> - Added preept_record_size for X185 in PATCH 3/7
> >>> - Added patches to reset perf counters
> >>> - Dropped unused defines
> >>> - Dropped unused variable (fixes warning)
> >>> - Only enable preemption on a750
> >>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> >>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> >>> - Added Neil's Tested-By tags
> >>> - Added explanation for UAPI changes in commit message
> >>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> >>>
> >>> ---
> >>> Antonino Maniscalco (11):
> >>>         drm/msm: Fix bv_fence being used as bv_rptr
> >>>         drm/msm/A6XX: Track current_ctx_seqno per ring
> >>>         drm/msm: Add a `preempt_record_size` field
> >>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >>>         drm/msm/A6xx: Implement preemption for A7XX targets
> >>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >>>         drm/msm/A6xx: Use posamble to reset counters on preemption
> >>>         drm/msm/A6xx: Add traces for preemption
> >>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >>>         drm/msm/A6xx: Enable preemption for A750
> >>>         Documentation: document adreno preemption
> >>>
> >>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
> >>>    drivers/gpu/drm/msm/Makefile                       |   1 +
> >>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> >>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> >>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> >>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> >>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> >>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> >>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> >>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> >>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> >>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> >>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> >>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> >>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> >>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> >>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> >>>    include/uapi/drm/msm_drm.h                         |   5 +-
> >>>    20 files changed, 1117 insertions(+), 66 deletions(-)
> >>> ---
> >>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> >>> change-id: 20240815-preemption-a750-t-fcee9a844b39
> >>>
> >>> Best regards,
> >>
> >> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> >> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> >> and forced to 1, and I've seen no regression so far
> >>
> >> On SM8550, I've seen a few:
> >> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> >> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> >> but it's unrelated to preempt
> >>
> >> and on SM8450:
> >> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> >> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> >> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> >>
> >> So you can also add:
> >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> >>
> >
> > Niel,
> >
> > On my x1e device, all submissions were somehow going into only a single
> > ring, even the compositor's. Not sure why. So effectively preemption was
> > not really exercised. I had to force one of the two benchmark I ran
> > using the "highprio" mesa debug flag force submittions to ring 0.
>
> I think that is because GL applications (so most compositors) run
> through zink which does not forward GL preemption to vulkan so yeah, for
> GL applications the only way of getting preemption is the debug flag.

I guess if it is mesa 24.2.x or newer it would be using the gallium
driver.  Which I guess would need xAMBLE stuff wired up.  Outside of
fd6_emit_restore() and fd6_gmem.cc there isn't really any state emit
in IB1, so I guess it probably wouldn't be too hard to get preemption
support wired up.

BR,
-R

> Unfortunately this is not easy to fix in Zink because it creates one
> VkDevice at screen creation and uses it for all GL contexts. Since GL
> priority is provided per context and at context creation time Zink has
> no way of handling this.
>
> Once TU will support more than one queue it will be possible for Zink to
> create one queue per priority then pick one at context creation time.
> Doing so would require a new vulkan extension for per queue global
> priority. I had started working on this some time ago
> https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
> but this solution will only be viable once TU can expose more than one
> queue.
>
> >
> > If possible it is a good idea to check the new preemption traces to
> > ensure preemption kicks in.
> >
> > -Akhil
> >
> >> Thanks,
> >> Neil
>
>
> Best regards,
> --
> Antonino Maniscalco <antomani103@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-24 14:47       ` Rob Clark
@ 2024-09-24 15:22         ` Akhil P Oommen
  2024-09-24 17:47           ` Antonino Maniscalco
  2024-09-25 19:37           ` Rob Clark
  2024-09-24 17:45         ` Antonino Maniscalco
  1 sibling, 2 replies; 35+ messages in thread
From: Akhil P Oommen @ 2024-09-24 15:22 UTC (permalink / raw)
  To: Rob Clark
  Cc: Antonino Maniscalco, Neil Armstrong, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

On Tue, Sep 24, 2024 at 07:47:12AM -0700, Rob Clark wrote:
> On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
> <antomani103@gmail.com> wrote:
> >
> > On 9/20/24 7:09 PM, Akhil P Oommen wrote:
> > > On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> > >> Hi,
> > >>
> > >> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > >>> This series implements preemption for A7XX targets, which allows the GPU to
> > >>> switch to an higher priority ring when work is pushed to it, reducing latency
> > >>> for high priority submissions.
> > >>>
> > >>> This series enables L1 preemption with skip_save_restore which requires
> > >>> the following userspace patches to function:
> > >>>
> > >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > >>>
> > >>> A flag is added to `msm_submitqueue_create` to only allow submissions
> > >>> from compatible userspace to be preempted, therefore maintaining
> > >>> compatibility.
> > >>>
> > >>> Preemption is currently only enabled by default on A750, it can be
> > >>> enabled on other targets through the `enable_preemption` module
> > >>> parameter. This is because more testing is required on other targets.
> > >>>
> > >>> For testing on other HW it is sufficient to set that parameter to a
> > >>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > >>> allows to run any application as high priority therefore preempting
> > >>> submissions from other applications.
> > >>>
> > >>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > >>> added in this series can be used to observe preemption's behavior as
> > >>> well as measuring preemption latency.
> > >>>
> > >>> Some commits from this series are based on a previous series to enable
> > >>> preemption on A6XX targets:
> > >>>
> > >>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > >>>
> > >>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > >>> ---
> > >>> Changes in v4:
> > >>> - Added missing register in pwrup list
> > >>> - Removed and rearrange barriers
> > >>> - Renamed `skip_inline_wptr` to `restore_wptr`
> > >>> - Track ctx seqno per ring
> > >>> - Removed secure preempt context
> > >>> - NOP out postamble to disable it instantly
> > >>> - Only emit pwrup reglist once
> > >>> - Document bv_rptr_addr
> > >>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > >>> - Set name on preempt record buffer
> > >>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > >>>
> > >>> Changes in v3:
> > >>> - Added documentation about preemption
> > >>> - Use quirks to determine which target supports preemption
> > >>> - Add a module parameter to force disabling or enabling preemption
> > >>> - Clear postamble when profiling
> > >>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > >>> - Make preemption records MAP_PRIV
> > >>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > >>>     anymore
> > >>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > >>>
> > >>> Changes in v2:
> > >>> - Added preept_record_size for X185 in PATCH 3/7
> > >>> - Added patches to reset perf counters
> > >>> - Dropped unused defines
> > >>> - Dropped unused variable (fixes warning)
> > >>> - Only enable preemption on a750
> > >>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > >>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > >>> - Added Neil's Tested-By tags
> > >>> - Added explanation for UAPI changes in commit message
> > >>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > >>>
> > >>> ---
> > >>> Antonino Maniscalco (11):
> > >>>         drm/msm: Fix bv_fence being used as bv_rptr
> > >>>         drm/msm/A6XX: Track current_ctx_seqno per ring
> > >>>         drm/msm: Add a `preempt_record_size` field
> > >>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > >>>         drm/msm/A6xx: Implement preemption for A7XX targets
> > >>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > >>>         drm/msm/A6xx: Use posamble to reset counters on preemption
> > >>>         drm/msm/A6xx: Add traces for preemption
> > >>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > >>>         drm/msm/A6xx: Enable preemption for A750
> > >>>         Documentation: document adreno preemption
> > >>>
> > >>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
> > >>>    drivers/gpu/drm/msm/Makefile                       |   1 +
> > >>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > >>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > >>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > >>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > >>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > >>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > >>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > >>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > >>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > >>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > >>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > >>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > >>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > >>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > >>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > >>>    include/uapi/drm/msm_drm.h                         |   5 +-
> > >>>    20 files changed, 1117 insertions(+), 66 deletions(-)
> > >>> ---
> > >>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > >>> change-id: 20240815-preemption-a750-t-fcee9a844b39
> > >>>
> > >>> Best regards,
> > >>
> > >> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > >> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > >> and forced to 1, and I've seen no regression so far
> > >>
> > >> On SM8550, I've seen a few:
> > >> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > >> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > >> but it's unrelated to preempt
> > >>
> > >> and on SM8450:
> > >> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > >> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > >> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> > >>
> > >> So you can also add:
> > >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> > >>
> > >
> > > Niel,
> > >
> > > On my x1e device, all submissions were somehow going into only a single
> > > ring, even the compositor's. Not sure why. So effectively preemption was
> > > not really exercised. I had to force one of the two benchmark I ran
> > > using the "highprio" mesa debug flag force submittions to ring 0.
> >
> > I think that is because GL applications (so most compositors) run
> > through zink which does not forward GL preemption to vulkan so yeah, for
> > GL applications the only way of getting preemption is the debug flag.
> 
> I guess if it is mesa 24.2.x or newer it would be using the gallium
> driver.  Which I guess would need xAMBLE stuff wired up.  Outside of
> fd6_emit_restore() and fd6_gmem.cc there isn't really any state emit
> in IB1, so I guess it probably wouldn't be too hard to get preemption
> support wired up.
> 
> BR,
> -R
> 
> > Unfortunately this is not easy to fix in Zink because it creates one
> > VkDevice at screen creation and uses it for all GL contexts. Since GL
> > priority is provided per context and at context creation time Zink has
> > no way of handling this.
> >
> > Once TU will support more than one queue it will be possible for Zink to
> > create one queue per priority then pick one at context creation time.
> > Doing so would require a new vulkan extension for per queue global
> > priority. I had started working on this some time ago
> > https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
> > but this solution will only be viable once TU can expose more than one
> > queue.
> >

Thanks, both of you for the clarification. Glad about the
improvements planned for both freedreno and zink.

-Akhil.

> > >
> > > If possible it is a good idea to check the new preemption traces to
> > > ensure preemption kicks in.
> > >
> > > -Akhil
> > >
> > >> Thanks,
> > >> Neil
> >
> >
> > Best regards,
> > --
> > Antonino Maniscalco <antomani103@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-24 15:22         ` Akhil P Oommen
@ 2024-09-24 17:47           ` Antonino Maniscalco
  2024-09-25 19:37           ` Rob Clark
  1 sibling, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-24 17:47 UTC (permalink / raw)
  To: Akhil P Oommen, Rob Clark
  Cc: Neil Armstrong, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, dri-devel, freedreno,
	linux-kernel, linux-doc, Sharat Masetty

On 9/24/24 5:22 PM, Akhil P Oommen wrote:
> On Tue, Sep 24, 2024 at 07:47:12AM -0700, Rob Clark wrote:
>> On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
>> <antomani103@gmail.com> wrote:
>>>
>>> On 9/20/24 7:09 PM, Akhil P Oommen wrote:
>>>> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>>>>> Hi,
>>>>>
>>>>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>>>>> This series implements preemption for A7XX targets, which allows the GPU to
>>>>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>>>>> for high priority submissions.
>>>>>>
>>>>>> This series enables L1 preemption with skip_save_restore which requires
>>>>>> the following userspace patches to function:
>>>>>>
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>>>>
>>>>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>>>>> from compatible userspace to be preempted, therefore maintaining
>>>>>> compatibility.
>>>>>>
>>>>>> Preemption is currently only enabled by default on A750, it can be
>>>>>> enabled on other targets through the `enable_preemption` module
>>>>>> parameter. This is because more testing is required on other targets.
>>>>>>
>>>>>> For testing on other HW it is sufficient to set that parameter to a
>>>>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>>>>> allows to run any application as high priority therefore preempting
>>>>>> submissions from other applications.
>>>>>>
>>>>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>>>>> added in this series can be used to observe preemption's behavior as
>>>>>> well as measuring preemption latency.
>>>>>>
>>>>>> Some commits from this series are based on a previous series to enable
>>>>>> preemption on A6XX targets:
>>>>>>
>>>>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>>>>
>>>>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>>>>> ---
>>>>>> Changes in v4:
>>>>>> - Added missing register in pwrup list
>>>>>> - Removed and rearrange barriers
>>>>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>>>>> - Track ctx seqno per ring
>>>>>> - Removed secure preempt context
>>>>>> - NOP out postamble to disable it instantly
>>>>>> - Only emit pwrup reglist once
>>>>>> - Document bv_rptr_addr
>>>>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>>>>> - Set name on preempt record buffer
>>>>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>>>>
>>>>>> Changes in v3:
>>>>>> - Added documentation about preemption
>>>>>> - Use quirks to determine which target supports preemption
>>>>>> - Add a module parameter to force disabling or enabling preemption
>>>>>> - Clear postamble when profiling
>>>>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>>>>> - Make preemption records MAP_PRIV
>>>>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>>>>      anymore
>>>>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>>>>
>>>>>> Changes in v2:
>>>>>> - Added preept_record_size for X185 in PATCH 3/7
>>>>>> - Added patches to reset perf counters
>>>>>> - Dropped unused defines
>>>>>> - Dropped unused variable (fixes warning)
>>>>>> - Only enable preemption on a750
>>>>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>>>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>>>>> - Added Neil's Tested-By tags
>>>>>> - Added explanation for UAPI changes in commit message
>>>>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>>>>
>>>>>> ---
>>>>>> Antonino Maniscalco (11):
>>>>>>          drm/msm: Fix bv_fence being used as bv_rptr
>>>>>>          drm/msm/A6XX: Track current_ctx_seqno per ring
>>>>>>          drm/msm: Add a `preempt_record_size` field
>>>>>>          drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>>>>          drm/msm/A6xx: Implement preemption for A7XX targets
>>>>>>          drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>>>>          drm/msm/A6xx: Use posamble to reset counters on preemption
>>>>>>          drm/msm/A6xx: Add traces for preemption
>>>>>>          drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>>>>          drm/msm/A6xx: Enable preemption for A750
>>>>>>          Documentation: document adreno preemption
>>>>>>
>>>>>>     Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>>>>     drivers/gpu/drm/msm/Makefile                       |   1 +
>>>>>>     drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>>>>     drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>>>>     drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>>>>     drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>>>>     drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>>>>     drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>>>>     drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>>>>     drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>>>>     drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>>>>     drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>>>>     drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>>>>     drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>>>>     drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>>>>     drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>>>>     drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>>>>     drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>>>>     .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>>>>     include/uapi/drm/msm_drm.h                         |   5 +-
>>>>>>     20 files changed, 1117 insertions(+), 66 deletions(-)
>>>>>> ---
>>>>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>>>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>>>>
>>>>>> Best regards,
>>>>>
>>>>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>>>>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>>>>> and forced to 1, and I've seen no regression so far
>>>>>
>>>>> On SM8550, I've seen a few:
>>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>>>>> but it's unrelated to preempt
>>>>>
>>>>> and on SM8450:
>>>>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>>>>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>>>>
>>>>> So you can also add:
>>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>>>>
>>>>
>>>> Niel,
>>>>
>>>> On my x1e device, all submissions were somehow going into only a single
>>>> ring, even the compositor's. Not sure why. So effectively preemption was
>>>> not really exercised. I had to force one of the two benchmark I ran
>>>> using the "highprio" mesa debug flag force submittions to ring 0.
>>>
>>> I think that is because GL applications (so most compositors) run
>>> through zink which does not forward GL preemption to vulkan so yeah, for
>>> GL applications the only way of getting preemption is the debug flag.
>>
>> I guess if it is mesa 24.2.x or newer it would be using the gallium
>> driver.  Which I guess would need xAMBLE stuff wired up.  Outside of
>> fd6_emit_restore() and fd6_gmem.cc there isn't really any state emit
>> in IB1, so I guess it probably wouldn't be too hard to get preemption
>> support wired up.
>>
>> BR,
>> -R
>>
>>> Unfortunately this is not easy to fix in Zink because it creates one
>>> VkDevice at screen creation and uses it for all GL contexts. Since GL
>>> priority is provided per context and at context creation time Zink has
>>> no way of handling this.
>>>
>>> Once TU will support more than one queue it will be possible for Zink to
>>> create one queue per priority then pick one at context creation time.
>>> Doing so would require a new vulkan extension for per queue global
>>> priority. I had started working on this some time ago
>>> https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
>>> but this solution will only be viable once TU can expose more than one
>>> queue.
>>>
> 
> Thanks, both of you for the clarification. Glad about the
> improvements planned for both freedreno and zink.

Np!

BTW Thanks for all the reviews.

> 
> -Akhil.
> 
>>>>
>>>> If possible it is a good idea to check the new preemption traces to
>>>> ensure preemption kicks in.
>>>>
>>>> -Akhil
>>>>
>>>>> Thanks,
>>>>> Neil
>>>
>>>
>>> Best regards,
>>> --
>>> Antonino Maniscalco <antomani103@gmail.com>


Best regards,
-- 
Antonino Maniscalco <antomani103@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-24 15:22         ` Akhil P Oommen
  2024-09-24 17:47           ` Antonino Maniscalco
@ 2024-09-25 19:37           ` Rob Clark
  1 sibling, 0 replies; 35+ messages in thread
From: Rob Clark @ 2024-09-25 19:37 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Antonino Maniscalco, Neil Armstrong, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

On Tue, Sep 24, 2024 at 8:22 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On Tue, Sep 24, 2024 at 07:47:12AM -0700, Rob Clark wrote:
> > On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
> > <antomani103@gmail.com> wrote:
> > >
> > > On 9/20/24 7:09 PM, Akhil P Oommen wrote:
> > > > On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> > > >> Hi,
> > > >>
> > > >> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > >>> This series implements preemption for A7XX targets, which allows the GPU to
> > > >>> switch to an higher priority ring when work is pushed to it, reducing latency
> > > >>> for high priority submissions.
> > > >>>
> > > >>> This series enables L1 preemption with skip_save_restore which requires
> > > >>> the following userspace patches to function:
> > > >>>
> > > >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > > >>>
> > > >>> A flag is added to `msm_submitqueue_create` to only allow submissions
> > > >>> from compatible userspace to be preempted, therefore maintaining
> > > >>> compatibility.
> > > >>>
> > > >>> Preemption is currently only enabled by default on A750, it can be
> > > >>> enabled on other targets through the `enable_preemption` module
> > > >>> parameter. This is because more testing is required on other targets.
> > > >>>
> > > >>> For testing on other HW it is sufficient to set that parameter to a
> > > >>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > >>> allows to run any application as high priority therefore preempting
> > > >>> submissions from other applications.
> > > >>>
> > > >>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > >>> added in this series can be used to observe preemption's behavior as
> > > >>> well as measuring preemption latency.
> > > >>>
> > > >>> Some commits from this series are based on a previous series to enable
> > > >>> preemption on A6XX targets:
> > > >>>
> > > >>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > > >>>
> > > >>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > >>> ---
> > > >>> Changes in v4:
> > > >>> - Added missing register in pwrup list
> > > >>> - Removed and rearrange barriers
> > > >>> - Renamed `skip_inline_wptr` to `restore_wptr`
> > > >>> - Track ctx seqno per ring
> > > >>> - Removed secure preempt context
> > > >>> - NOP out postamble to disable it instantly
> > > >>> - Only emit pwrup reglist once
> > > >>> - Document bv_rptr_addr
> > > >>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > >>> - Set name on preempt record buffer
> > > >>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > > >>>
> > > >>> Changes in v3:
> > > >>> - Added documentation about preemption
> > > >>> - Use quirks to determine which target supports preemption
> > > >>> - Add a module parameter to force disabling or enabling preemption
> > > >>> - Clear postamble when profiling
> > > >>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > >>> - Make preemption records MAP_PRIV
> > > >>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > > >>>     anymore
> > > >>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > > >>>
> > > >>> Changes in v2:
> > > >>> - Added preept_record_size for X185 in PATCH 3/7
> > > >>> - Added patches to reset perf counters
> > > >>> - Dropped unused defines
> > > >>> - Dropped unused variable (fixes warning)
> > > >>> - Only enable preemption on a750
> > > >>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > >>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > >>> - Added Neil's Tested-By tags
> > > >>> - Added explanation for UAPI changes in commit message
> > > >>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > > >>>
> > > >>> ---
> > > >>> Antonino Maniscalco (11):
> > > >>>         drm/msm: Fix bv_fence being used as bv_rptr
> > > >>>         drm/msm/A6XX: Track current_ctx_seqno per ring
> > > >>>         drm/msm: Add a `preempt_record_size` field
> > > >>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > > >>>         drm/msm/A6xx: Implement preemption for A7XX targets
> > > >>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > > >>>         drm/msm/A6xx: Use posamble to reset counters on preemption
> > > >>>         drm/msm/A6xx: Add traces for preemption
> > > >>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > > >>>         drm/msm/A6xx: Enable preemption for A750
> > > >>>         Documentation: document adreno preemption
> > > >>>
> > > >>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
> > > >>>    drivers/gpu/drm/msm/Makefile                       |   1 +
> > > >>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > > >>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > > >>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > > >>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > > >>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > > >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > > >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > > >>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > > >>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > > >>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > > >>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > > >>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > > >>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > > >>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > > >>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > > >>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > > >>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > > >>>    include/uapi/drm/msm_drm.h                         |   5 +-
> > > >>>    20 files changed, 1117 insertions(+), 66 deletions(-)
> > > >>> ---
> > > >>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > >>> change-id: 20240815-preemption-a750-t-fcee9a844b39
> > > >>>
> > > >>> Best regards,
> > > >>
> > > >> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > > >> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > > >> and forced to 1, and I've seen no regression so far
> > > >>
> > > >> On SM8550, I've seen a few:
> > > >> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > > >> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > > >> but it's unrelated to preempt
> > > >>
> > > >> and on SM8450:
> > > >> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > > >> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > > >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > > >> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > >> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> > > >>
> > > >> So you can also add:
> > > >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > > >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> > > >>
> > > >
> > > > Niel,
> > > >
> > > > On my x1e device, all submissions were somehow going into only a single
> > > > ring, even the compositor's. Not sure why. So effectively preemption was
> > > > not really exercised. I had to force one of the two benchmark I ran
> > > > using the "highprio" mesa debug flag force submittions to ring 0.
> > >
> > > I think that is because GL applications (so most compositors) run
> > > through zink which does not forward GL preemption to vulkan so yeah, for
> > > GL applications the only way of getting preemption is the debug flag.
> >
> > I guess if it is mesa 24.2.x or newer it would be using the gallium
> > driver.  Which I guess would need xAMBLE stuff wired up.  Outside of
> > fd6_emit_restore() and fd6_gmem.cc there isn't really any state emit
> > in IB1, so I guess it probably wouldn't be too hard to get preemption
> > support wired up.
> >
> > BR,
> > -R
> >
> > > Unfortunately this is not easy to fix in Zink because it creates one
> > > VkDevice at screen creation and uses it for all GL contexts. Since GL
> > > priority is provided per context and at context creation time Zink has
> > > no way of handling this.
> > >
> > > Once TU will support more than one queue it will be possible for Zink to
> > > create one queue per priority then pick one at context creation time.
> > > Doing so would require a new vulkan extension for per queue global
> > > priority. I had started working on this some time ago
> > > https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
> > > but this solution will only be viable once TU can expose more than one
> > > queue.
> > >
>
> Thanks, both of you for the clarification. Glad about the
> improvements planned for both freedreno and zink.

Only lightly tested so far, but
https://gitlab.freedesktop.org/robclark/mesa/-/tree/fd/a7xx-preemption?ref_type=heads
appears to work for getting preemption going on gl.  Needs some
cleanup/etc, but if you want something with gl compositor supporting
preemption, you can give it a try

BR,
-R

> -Akhil.
>
> > > >
> > > > If possible it is a good idea to check the new preemption traces to
> > > > ensure preemption kicks in.
> > > >
> > > > -Akhil
> > > >
> > > >> Thanks,
> > > >> Neil
> > >
> > >
> > > Best regards,
> > > --
> > > Antonino Maniscalco <antomani103@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Preemption support for A7XX
  2024-09-24 14:47       ` Rob Clark
  2024-09-24 15:22         ` Akhil P Oommen
@ 2024-09-24 17:45         ` Antonino Maniscalco
  1 sibling, 0 replies; 35+ messages in thread
From: Antonino Maniscalco @ 2024-09-24 17:45 UTC (permalink / raw)
  To: Rob Clark
  Cc: Akhil P Oommen, Neil Armstrong, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, linux-arm-msm, dri-devel,
	freedreno, linux-kernel, linux-doc, Sharat Masetty

On 9/24/24 4:47 PM, Rob Clark wrote:
> On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
> <antomani103@gmail.com> wrote:
>>
>> On 9/20/24 7:09 PM, Akhil P Oommen wrote:
>>> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>>>> Hi,
>>>>
>>>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>>>> This series implements preemption for A7XX targets, which allows the GPU to
>>>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>>>> for high priority submissions.
>>>>>
>>>>> This series enables L1 preemption with skip_save_restore which requires
>>>>> the following userspace patches to function:
>>>>>
>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>>>
>>>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>>>> from compatible userspace to be preempted, therefore maintaining
>>>>> compatibility.
>>>>>
>>>>> Preemption is currently only enabled by default on A750, it can be
>>>>> enabled on other targets through the `enable_preemption` module
>>>>> parameter. This is because more testing is required on other targets.
>>>>>
>>>>> For testing on other HW it is sufficient to set that parameter to a
>>>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>>>> allows to run any application as high priority therefore preempting
>>>>> submissions from other applications.
>>>>>
>>>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>>>> added in this series can be used to observe preemption's behavior as
>>>>> well as measuring preemption latency.
>>>>>
>>>>> Some commits from this series are based on a previous series to enable
>>>>> preemption on A6XX targets:
>>>>>
>>>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>>>
>>>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>>>> ---
>>>>> Changes in v4:
>>>>> - Added missing register in pwrup list
>>>>> - Removed and rearrange barriers
>>>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>>>> - Track ctx seqno per ring
>>>>> - Removed secure preempt context
>>>>> - NOP out postamble to disable it instantly
>>>>> - Only emit pwrup reglist once
>>>>> - Document bv_rptr_addr
>>>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>>>> - Set name on preempt record buffer
>>>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>>>
>>>>> Changes in v3:
>>>>> - Added documentation about preemption
>>>>> - Use quirks to determine which target supports preemption
>>>>> - Add a module parameter to force disabling or enabling preemption
>>>>> - Clear postamble when profiling
>>>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>>>> - Make preemption records MAP_PRIV
>>>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>>>      anymore
>>>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>>>
>>>>> Changes in v2:
>>>>> - Added preept_record_size for X185 in PATCH 3/7
>>>>> - Added patches to reset perf counters
>>>>> - Dropped unused defines
>>>>> - Dropped unused variable (fixes warning)
>>>>> - Only enable preemption on a750
>>>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>>>> - Added Neil's Tested-By tags
>>>>> - Added explanation for UAPI changes in commit message
>>>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>>>
>>>>> ---
>>>>> Antonino Maniscalco (11):
>>>>>          drm/msm: Fix bv_fence being used as bv_rptr
>>>>>          drm/msm/A6XX: Track current_ctx_seqno per ring
>>>>>          drm/msm: Add a `preempt_record_size` field
>>>>>          drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>>>          drm/msm/A6xx: Implement preemption for A7XX targets
>>>>>          drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>>>          drm/msm/A6xx: Use posamble to reset counters on preemption
>>>>>          drm/msm/A6xx: Add traces for preemption
>>>>>          drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>>>          drm/msm/A6xx: Enable preemption for A750
>>>>>          Documentation: document adreno preemption
>>>>>
>>>>>     Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>>>     drivers/gpu/drm/msm/Makefile                       |   1 +
>>>>>     drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>>>     drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>>>     drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>>>     drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>>>     drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>>>     drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>>>     drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>>>     drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>>>     drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>>>     drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>>>     drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>>>     drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>>>     .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>>>     include/uapi/drm/msm_drm.h                         |   5 +-
>>>>>     20 files changed, 1117 insertions(+), 66 deletions(-)
>>>>> ---
>>>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>>>
>>>>> Best regards,
>>>>
>>>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>>>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>>>> and forced to 1, and I've seen no regression so far
>>>>
>>>> On SM8550, I've seen a few:
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>>>> but it's unrelated to preempt
>>>>
>>>> and on SM8450:
>>>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>>>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>>>
>>>> So you can also add:
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>>>
>>>
>>> Niel,
>>>
>>> On my x1e device, all submissions were somehow going into only a single
>>> ring, even the compositor's. Not sure why. So effectively preemption was
>>> not really exercised. I had to force one of the two benchmark I ran
>>> using the "highprio" mesa debug flag force submittions to ring 0.
>>
>> I think that is because GL applications (so most compositors) run
>> through zink which does not forward GL preemption to vulkan so yeah, for
>> GL applications the only way of getting preemption is the debug flag.
> 
> I guess if it is mesa 24.2.x or newer it would be using the gallium
> driver.  Which I guess would need xAMBLE stuff wired up.  Outside of
> fd6_emit_restore() and fd6_gmem.cc there isn't really any state emit
> in IB1, so I guess it probably wouldn't be too hard to get preemption
> support wired up.

I hadn't realized a7xx supportd had landed for the gallium driver. 
That's good news! This is definitely a shorter path towards getting 
compositors to use preemption.

> 
> BR,
> -R
> 
>> Unfortunately this is not easy to fix in Zink because it creates one
>> VkDevice at screen creation and uses it for all GL contexts. Since GL
>> priority is provided per context and at context creation time Zink has
>> no way of handling this.
>>
>> Once TU will support more than one queue it will be possible for Zink to
>> create one queue per priority then pick one at context creation time.
>> Doing so would require a new vulkan extension for per queue global
>> priority. I had started working on this some time ago
>> https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
>> but this solution will only be viable once TU can expose more than one
>> queue.
>>
>>>
>>> If possible it is a good idea to check the new preemption traces to
>>> ensure preemption kicks in.
>>>
>>> -Akhil
>>>
>>>> Thanks,
>>>> Neil
>>
>>
>> Best regards,
>> --
>> Antonino Maniscalco <antomani103@gmail.com>


Best regards,
-- 
Antonino Maniscalco <antomani103@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-10-23 16:22 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 11:14 [PATCH v4 00/11] Preemption support for A7XX Antonino Maniscalco
2024-09-17 11:14 ` [PATCH v4 01/11] drm/msm: Fix bv_fence being used as bv_rptr Antonino Maniscalco
2024-09-17 11:14 ` [PATCH v4 02/11] drm/msm/A6XX: Track current_ctx_seqno per ring Antonino Maniscalco
2024-09-17 11:14 ` [PATCH v4 03/11] drm/msm: Add a `preempt_record_size` field Antonino Maniscalco
2024-09-17 11:14 ` [PATCH v4 04/11] drm/msm: Add CONTEXT_SWITCH_CNTL bitfields Antonino Maniscalco
2024-09-17 11:14 ` [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets Antonino Maniscalco
2024-09-20 16:37   ` Akhil P Oommen
2024-09-17 11:14 ` [PATCH v4 06/11] drm/msm/A6xx: Sync relevant adreno_pm4.xml changes Antonino Maniscalco
2024-09-17 11:14 ` [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption Antonino Maniscalco
2024-09-20 16:43   ` Akhil P Oommen
2024-09-17 11:14 ` [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption Antonino Maniscalco
2024-09-20 16:50   ` Akhil P Oommen
2024-09-17 11:14 ` [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create Antonino Maniscalco
2024-09-20 16:54   ` Akhil P Oommen
2024-09-20 17:29     ` Rob Clark
2024-09-23 19:44       ` Akhil P Oommen
2024-09-17 11:14 ` [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750 Antonino Maniscalco
2024-09-20 17:02   ` Akhil P Oommen
2024-09-17 11:14 ` [PATCH v4 11/11] Documentation: document adreno preemption Antonino Maniscalco
2024-09-19 14:19   ` Connor Abbott
2024-09-18  7:46 ` [PATCH v4 00/11] Preemption support for A7XX Neil Armstrong
2024-09-18 11:20   ` Antonino Maniscalco
2024-09-18 15:39   ` Rob Clark
2024-09-20 16:14     ` Akhil P Oommen
2024-09-20 17:17       ` Rob Clark
2024-10-22 15:05       ` Rob Clark
2024-10-23 16:21         ` Akhil P Oommen
2024-09-20 17:09   ` Akhil P Oommen
2024-09-21 16:45     ` Neil Armstrong
2024-09-24 11:53     ` Antonino Maniscalco
2024-09-24 14:47       ` Rob Clark
2024-09-24 15:22         ` Akhil P Oommen
2024-09-24 17:47           ` Antonino Maniscalco
2024-09-25 19:37           ` Rob Clark
2024-09-24 17:45         ` Antonino Maniscalco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox