* [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support
@ 2022-10-12 22:27 Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 01/16] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
` (19 more replies)
0 siblings, 20 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Add OA format support for DG2 and various fixes for DG2.
This series has 2 uapi changes listed below:
1) drm/i915/perf: Add OAG and OAR formats for DG2
DG2 has new OA formats defined that can be selected by the
user. The UMD changes that are consumed by GPUvis are:
https://patchwork.freedesktop.org/patch/504456/?series=107633&rev=5
2) drm/i915/perf: Apply Wa_18013179988
DG2 has a bug where the OA timestamp does not tick at the CS timestamp
frequency. Instead it ticks at a multiple that is determined from the
CTC_SHIFT value in RPM_CONFIG. Since the timestamp is used by UMD to
make sense of all the counters in the report, expose the OA timestamp
frequency to the user. The interface is generic and applies to all
platforms. On platforms where the bug is not present, this returns the
CS timestamp frequency. UMD specific changes consumed by GPUvis are:
https://patchwork.freedesktop.org/patch/504464/?series=107633&rev=5
v2:
- Add review comments
- Update uapi changes in cover letter
- Drop patches for non-production platforms
drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
drm/i915/perf: Add Wa_16010703925:dg2
- Drop 64-bit OA format changes for now
drm/i915/perf: Parse 64bit report header formats correctly
drm/i915/perf: Add Wa_1608133521:dg2
v3:
- Add review comments to patches 02, 04, 05, 14
- Drop Acks
v4:
- Add review comments to patch 04
- Update R-bs
- Add MR links to patches 02 and 12
Test-with: 20220923195224.283045-1-umesh.nerlige.ramappa@intel.com
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Lionel Landwerlin (1):
drm/i915/perf: complete programming whitelisting for XEHPSDV
Umesh Nerlige Ramappa (14):
drm/i915/perf: Fix OA filtering logic for GuC mode
drm/i915/perf: Add 32-bit OAG and OAR formats for DG2
drm/i915/perf: Fix noa wait predication for DG2
drm/i915/perf: Determine gen12 oa ctx offset at runtime
drm/i915/perf: Enable bytes per clock reporting in OA
drm/i915/perf: Simply use stream->ctx
drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
drm/i915/perf: Store a pointer to oa_format in oa_buffer
drm/i915/perf: Add Wa_1508761755:dg2
drm/i915/perf: Apply Wa_18013179988
drm/i915/perf: Save/restore EU flex counters across reset
drm/i915/perf: Enable OA for DG2
Vinay Belgaumkar (1):
drm/i915/guc: Support OA when Wa_16011777198 is enabled
drivers/gpu/drm/i915/gt/intel_engine_regs.h | 1 +
drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 4 +
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
drivers/gpu/drm/i915/gt/intel_gt_types.h | 3 +
drivers/gpu/drm/i915/gt/intel_lrc.h | 2 +
drivers/gpu/drm/i915/gt/intel_sseu.c | 4 +-
.../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 9 +
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 8 +
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 66 ++
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h | 2 +
drivers/gpu/drm/i915/i915_drv.h | 5 +
drivers/gpu/drm/i915/i915_getparam.c | 3 +
drivers/gpu/drm/i915/i915_pci.c | 2 +
drivers/gpu/drm/i915/i915_perf.c | 579 ++++++++++++++----
drivers/gpu/drm/i915/i915_perf.h | 2 +
drivers/gpu/drm/i915/i915_perf_oa_regs.h | 6 +-
drivers/gpu/drm/i915/i915_perf_types.h | 47 +-
drivers/gpu/drm/i915/intel_device_info.h | 2 +
drivers/gpu/drm/i915/selftests/i915_perf.c | 16 +-
include/uapi/drm/i915_drm.h | 10 +
20 files changed, 631 insertions(+), 141 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 01/16] drm/i915/perf: Fix OA filtering logic for GuC mode
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 02/16] drm/i915/perf: Add 32-bit OAG and OAR formats for DG2 Umesh Nerlige Ramappa
` (18 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
With GuC mode of submission, GuC is in control of defining the context
id field that is part of the OA reports. To filter reports, UMD and KMD
must know what sw context id was chosen by GuC. There is not interface
between KMD and GuC to determine this, so read the upper-dword of
EXECLIST_STATUS to filter/squash OA reports for the specific context.
v2: Explain guc id stealing w.r.t OA use case
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/gt/intel_lrc.h | 2 +
drivers/gpu/drm/i915/i915_perf.c | 144 ++++++++++++++++++++++++----
2 files changed, 127 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index a390f0813c8b..7111bae759f3 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -110,6 +110,8 @@ enum {
#define XEHP_SW_CTX_ID_WIDTH 16
#define XEHP_SW_COUNTER_SHIFT 58
#define XEHP_SW_COUNTER_WIDTH 6
+#define GEN12_GUC_SW_CTX_ID_SHIFT 39
+#define GEN12_GUC_SW_CTX_ID_WIDTH 16
static inline void lrc_runtime_start(struct intel_context *ce)
{
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 15816df916c7..255335868b6a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1231,6 +1231,128 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
return stream->pinned_ctx;
}
+static int
+__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
+{
+ u32 *cs, cmd;
+
+ cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+ if (GRAPHICS_VER(rq->engine->i915) >= 8)
+ cmd++;
+
+ cs = intel_ring_begin(rq, 4);
+ if (IS_ERR(cs))
+ return PTR_ERR(cs);
+
+ *cs++ = cmd;
+ *cs++ = i915_mmio_reg_offset(reg);
+ *cs++ = ggtt_offset;
+ *cs++ = 0;
+
+ intel_ring_advance(rq, cs);
+
+ return 0;
+}
+
+static int
+__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
+{
+ struct i915_request *rq;
+ int err;
+
+ rq = i915_request_create(ce);
+ if (IS_ERR(rq))
+ return PTR_ERR(rq);
+
+ i915_request_get(rq);
+
+ err = __store_reg_to_mem(rq, reg, ggtt_offset);
+
+ i915_request_add(rq);
+ if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
+ err = -ETIME;
+
+ i915_request_put(rq);
+
+ return err;
+}
+
+static int
+gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
+{
+ struct i915_vma *scratch;
+ u32 *val;
+ int err;
+
+ scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
+ if (IS_ERR(scratch))
+ return PTR_ERR(scratch);
+
+ err = i915_vma_sync(scratch);
+ if (err)
+ goto err_scratch;
+
+ err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
+ i915_ggtt_offset(scratch));
+ if (err)
+ goto err_scratch;
+
+ val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
+ if (IS_ERR(val)) {
+ err = PTR_ERR(val);
+ goto err_scratch;
+ }
+
+ *ctx_id = *val;
+ i915_gem_object_unpin_map(scratch->obj);
+
+err_scratch:
+ i915_vma_unpin_and_release(&scratch, 0);
+ return err;
+}
+
+/*
+ * For execlist mode of submission, pick an unused context id
+ * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
+ * XXX_MAX_CONTEXT_HW_ID is used by idle context
+ *
+ * For GuC mode of submission read context id from the upper dword of the
+ * EXECLIST_STATUS register. Note that we read this value only once and expect
+ * that the value stays fixed for the entire OA use case. There are cases where
+ * GuC KMD implementation may deregister a context to reuse it's context id, but
+ * we prevent that from happening to the OA context by pinning it.
+ */
+static int gen12_get_render_context_id(struct i915_perf_stream *stream)
+{
+ u32 ctx_id, mask;
+ int ret;
+
+ if (intel_engine_uses_guc(stream->engine)) {
+ ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
+ if (ret)
+ return ret;
+
+ mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
+ (GEN12_GUC_SW_CTX_ID_SHIFT - 32);
+ } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
+ ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
+ (XEHP_SW_CTX_ID_SHIFT - 32);
+
+ mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
+ (XEHP_SW_CTX_ID_SHIFT - 32);
+ } else {
+ ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
+ (GEN11_SW_CTX_ID_SHIFT - 32);
+
+ mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
+ (GEN11_SW_CTX_ID_SHIFT - 32);
+ }
+ stream->specific_ctx_id = ctx_id & mask;
+ stream->specific_ctx_id_mask = mask;
+
+ return 0;
+}
+
/**
* oa_get_render_ctx_id - determine and hold ctx hw id
* @stream: An i915-perf stream opened for OA metrics
@@ -1244,6 +1366,7 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
{
struct intel_context *ce;
+ int ret = 0;
ce = oa_pin_context(stream);
if (IS_ERR(ce))
@@ -1290,24 +1413,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
case 11:
case 12:
- if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
- stream->specific_ctx_id_mask =
- ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
- (XEHP_SW_CTX_ID_SHIFT - 32);
- stream->specific_ctx_id =
- (XEHP_MAX_CONTEXT_HW_ID - 1) <<
- (XEHP_SW_CTX_ID_SHIFT - 32);
- } else {
- stream->specific_ctx_id_mask =
- ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
- /*
- * Pick an unused context id
- * 0 - BITS_PER_LONG are used by other contexts
- * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
- */
- stream->specific_ctx_id =
- (GEN12_MAX_CONTEXT_HW_ID - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
- }
+ ret = gen12_get_render_context_id(stream);
break;
default:
@@ -1321,7 +1427,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
stream->specific_ctx_id,
stream->specific_ctx_id_mask);
- return 0;
+ return ret;
}
/**
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 02/16] drm/i915/perf: Add 32-bit OAG and OAR formats for DG2
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 01/16] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 03/16] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
` (17 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Add new OA formats for DG2.
MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893
v2:
- Update commit title (Ashutosh)
- Coding style fixes (Lionel)
- 64 bit OA formats need UMD changes in GPUvis, drop for now and send in a
separate series with UMD changes
v3:
- Update commit message to drop 64 bit related description
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #1
---
drivers/gpu/drm/i915/i915_perf.c | 7 +++++++
include/uapi/drm/i915_drm.h | 4 ++++
2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 255335868b6a..2b772a6b1cd6 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -320,6 +320,8 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
[I915_OA_FORMAT_A12] = { 0, 64 },
[I915_OA_FORMAT_A12_B8_C8] = { 2, 128 },
[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
+ [I915_OAR_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
+ [I915_OA_FORMAT_A24u40_A14u32_B8_C8] = { 5, 256 },
};
#define SAMPLE_OA_REPORT (1<<0)
@@ -4515,6 +4517,11 @@ static void oa_init_supported_formats(struct i915_perf *perf)
oa_format_add(perf, I915_OA_FORMAT_C4_B8);
break;
+ case INTEL_DG2:
+ oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
+ oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
+ break;
+
default:
MISSING_CASE(platform);
}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 629198f1d8d8..7cad631040d3 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2666,6 +2666,10 @@ enum drm_i915_oa_format {
I915_OA_FORMAT_A12_B8_C8,
I915_OA_FORMAT_A32u40_A4u32_B8_C8,
+ /* DG2 */
+ I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
+ I915_OA_FORMAT_A24u40_A14u32_B8_C8,
+
I915_OA_FORMAT_MAX /* non-ABI */
};
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 03/16] drm/i915/perf: Fix noa wait predication for DG2
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 01/16] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 02/16] drm/i915/perf: Add 32-bit OAG and OAR formats for DG2 Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
` (16 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Predication for batch buffer commands changed in XEHPSDV.
MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
register. The MI_SET_PREDICATE_RESULT register can only be modified
with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
command sets MI_SET_PREDICATE_RESULT based on bit 0 of
MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/gt/intel_engine_regs.h | 1 +
drivers/gpu/drm/i915/i915_perf.c | 24 +++++++++++++++++----
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
index fe1a0d5fd4b1..ee3efd06ee54 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
@@ -201,6 +201,7 @@
#define RING_CONTEXT_STATUS_PTR(base) _MMIO((base) + 0x3a0)
#define RING_CTX_TIMESTAMP(base) _MMIO((base) + 0x3a8) /* gen8+ */
#define RING_PREDICATE_RESULT(base) _MMIO((base) + 0x3b8)
+#define MI_PREDICATE_RESULT_2_ENGINE(base) _MMIO((base) + 0x3bc)
#define RING_FORCE_TO_NONPRIV(base, i) _MMIO(((base) + 0x4D0) + (i) * 4)
#define RING_FORCE_TO_NONPRIV_DENY REG_BIT(30)
#define RING_FORCE_TO_NONPRIV_ADDRESS_MASK REG_GENMASK(25, 2)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2b772a6b1cd6..e68666b44a72 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -286,6 +286,7 @@ static u32 i915_perf_stream_paranoid = true;
#define OAREPORT_REASON_CTX_SWITCH (1<<3)
#define OAREPORT_REASON_CLK_RATIO (1<<5)
+#define HAS_MI_SET_PREDICATE(i915) (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
*
@@ -1760,6 +1761,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
DELTA_TARGET,
N_CS_GPR
};
+ i915_reg_t mi_predicate_result = HAS_MI_SET_PREDICATE(i915) ?
+ MI_PREDICATE_RESULT_2_ENGINE(base) :
+ MI_PREDICATE_RESULT_1(RENDER_RING_BASE);
bo = i915_gem_object_create_internal(i915, 4096);
if (IS_ERR(bo)) {
@@ -1797,7 +1801,7 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
stream, cs, true /* save */, CS_GPR(i),
INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
cs = save_restore_register(
- stream, cs, true /* save */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
+ stream, cs, true /* save */, mi_predicate_result,
INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
/* First timestamp snapshot location. */
@@ -1851,7 +1855,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
*/
*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
- *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
+ *cs++ = i915_mmio_reg_offset(mi_predicate_result);
+
+ if (HAS_MI_SET_PREDICATE(i915))
+ *cs++ = MI_SET_PREDICATE | 1;
/* Restart from the beginning if we had timestamps roll over. */
*cs++ = (GRAPHICS_VER(i915) < 8 ?
@@ -1861,6 +1868,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
*cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4;
*cs++ = 0;
+ if (HAS_MI_SET_PREDICATE(i915))
+ *cs++ = MI_SET_PREDICATE;
+
/*
* Now add the diff between to previous timestamps and add it to :
* (((1 * << 64) - 1) - delay_ns)
@@ -1888,7 +1898,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
*/
*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
- *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
+ *cs++ = i915_mmio_reg_offset(mi_predicate_result);
+
+ if (HAS_MI_SET_PREDICATE(i915))
+ *cs++ = MI_SET_PREDICATE | 1;
/* Predicate the jump. */
*cs++ = (GRAPHICS_VER(i915) < 8 ?
@@ -1898,13 +1911,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
*cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4;
*cs++ = 0;
+ if (HAS_MI_SET_PREDICATE(i915))
+ *cs++ = MI_SET_PREDICATE;
+
/* Restore registers. */
for (i = 0; i < N_CS_GPR; i++)
cs = save_restore_register(
stream, cs, false /* restore */, CS_GPR(i),
INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
cs = save_restore_register(
- stream, cs, false /* restore */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
+ stream, cs, false /* restore */, mi_predicate_result,
INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
/* And return to the ring. */
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (2 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 03/16] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 23:46 ` Dixit, Ashutosh
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 05/16] drm/i915/perf: Enable bytes per clock reporting in OA Umesh Nerlige Ramappa
` (15 subsequent siblings)
19 siblings, 1 reply; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.
v2: (Lionel)
- Move MI definitions to intel_gpu_commands.h
- Ensure __find_reg_in_lri does read past context image size
v3: (Ashutosh)
- Drop unnecessary use of double underscores
- fix find_reg_in_lri
- Return error if oa context offset is U32_MAX
- Error out if oa_ctx_ctrl_offset does not find offset
v4: (Ashutosh)
- Warn on odd MI LRI_LEN
- Remove unnecessary check for valid_oactxctrl_offset
- Drop valid_oactxctrl_offset macro
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 4 +
drivers/gpu/drm/i915/i915_perf.c | 150 ++++++++++++++++---
drivers/gpu/drm/i915/i915_perf_oa_regs.h | 2 +-
3 files changed, 131 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index d4e9702d3c8e..f50ea92910d9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -187,6 +187,10 @@
#define MI_BATCH_RESOURCE_STREAMER REG_BIT(10)
#define MI_BATCH_PREDICATE REG_BIT(15) /* HSW+ on RCS only*/
+#define MI_OPCODE(x) (((x) >> 23) & 0x3f)
+#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
+#define MI_LRI_LEN(x) (((x) & 0xff) + 1)
+
/*
* 3D instructions used by the kernel
*/
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e68666b44a72..0ba61daedf6d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1356,6 +1356,78 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
return 0;
}
+static bool oa_find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
+{
+ u32 idx = *offset;
+ u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
+ bool found = false;
+
+ idx++;
+ for (; idx < len; idx += 2) {
+ if (state[idx] == reg) {
+ found = true;
+ break;
+ }
+ }
+
+ *offset = idx;
+ return found;
+}
+
+static u32 oa_context_image_offset(struct intel_context *ce, u32 reg)
+{
+ u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
+ u32 *state = ce->lrc_reg_state;
+
+ for (offset = 0; offset < len; ) {
+ if (IS_MI_LRI_CMD(state[offset])) {
+ /*
+ * We expect reg-value pairs in MI_LRI command, so
+ * MI_LRI_LEN() should be even, if not, issue a warning.
+ */
+ drm_WARN_ON(&ce->engine->i915->drm,
+ MI_LRI_LEN(state[offset]) & 0x1);
+
+ if (oa_find_reg_in_lri(state, reg, &offset, len))
+ break;
+ } else {
+ offset++;
+ }
+ }
+
+ /*
+ * Note that the the reg value is written to 'offset + 1' location, so
+ * ensure that the offset is less than (len - 1).
+ */
+ return offset < len ? offset : U32_MAX;
+}
+
+static int set_oa_ctx_ctrl_offset(struct intel_context *ce)
+{
+ i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
+ struct i915_perf *perf = &ce->engine->i915->perf;
+ u32 offset = perf->ctx_oactxctrl_offset;
+
+ /* Do this only once. Failure is stored as offset of U32_MAX */
+ if (offset)
+ goto exit;
+
+ offset = oa_context_image_offset(ce, i915_mmio_reg_offset(reg));
+ perf->ctx_oactxctrl_offset = offset;
+
+ drm_dbg(&ce->engine->i915->drm,
+ "%s oa ctx control at 0x%08x dword offset\n",
+ ce->engine->name, offset);
+
+exit:
+ return offset && offset != U32_MAX ? 0 : -ENODEV;
+}
+
+static bool engine_supports_mi_query(struct intel_engine_cs *engine)
+{
+ return engine->class == RENDER_CLASS;
+}
+
/**
* oa_get_render_ctx_id - determine and hold ctx hw id
* @stream: An i915-perf stream opened for OA metrics
@@ -1375,6 +1447,21 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
if (IS_ERR(ce))
return PTR_ERR(ce);
+ if (engine_supports_mi_query(stream->engine)) {
+ /*
+ * We are enabling perf query here. If we don't find the context
+ * offset here, just return an error.
+ */
+ ret = set_oa_ctx_ctrl_offset(ce);
+ if (ret) {
+ intel_context_unpin(ce);
+ drm_err(&stream->perf->i915->drm,
+ "Enabling perf query failed for %s\n",
+ stream->engine->name);
+ return ret;
+ }
+ }
+
switch (GRAPHICS_VER(ce->engine->i915)) {
case 7: {
/*
@@ -2406,10 +2493,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
int err;
struct intel_context *ce = stream->pinned_ctx;
u32 format = stream->oa_buffer.format;
+ u32 offset = stream->perf->ctx_oactxctrl_offset;
struct flex regs_context[] = {
{
GEN8_OACTXCONTROL,
- stream->perf->ctx_oactxctrl_offset + 1,
+ offset + 1,
active ? GEN8_OA_COUNTER_RESUME : 0,
},
};
@@ -2434,12 +2522,13 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
},
};
- /* Modify the context image of pinned context with regs_context*/
+ /* Modify the context image of pinned context with regs_context */
err = intel_context_lock_pinned(ce);
if (err)
return err;
- err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
+ err = gen8_modify_context(ce, regs_context,
+ ARRAY_SIZE(regs_context));
intel_context_unlock_pinned(ce);
if (err)
return err;
@@ -2564,6 +2653,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
const struct i915_oa_config *oa_config,
struct i915_active *active)
{
+ u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
/* The MMIO offsets for Flex EU registers aren't contiguous */
const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
#define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
@@ -2574,7 +2664,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
},
{
GEN8_OACTXCONTROL,
- stream->perf->ctx_oactxctrl_offset + 1,
+ ctx_oactxctrl + 1,
},
{ EU_PERF_CNTL0, ctx_flexeuN(0) },
{ EU_PERF_CNTL1, ctx_flexeuN(1) },
@@ -4543,6 +4633,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
}
}
+static void i915_perf_init_info(struct drm_i915_private *i915)
+{
+ struct i915_perf *perf = &i915->perf;
+
+ switch (GRAPHICS_VER(i915)) {
+ case 8:
+ perf->ctx_oactxctrl_offset = 0x120;
+ perf->ctx_flexeu0_offset = 0x2ce;
+ perf->gen8_valid_ctx_bit = BIT(25);
+ break;
+ case 9:
+ perf->ctx_oactxctrl_offset = 0x128;
+ perf->ctx_flexeu0_offset = 0x3de;
+ perf->gen8_valid_ctx_bit = BIT(16);
+ break;
+ case 11:
+ perf->ctx_oactxctrl_offset = 0x124;
+ perf->ctx_flexeu0_offset = 0x78e;
+ perf->gen8_valid_ctx_bit = BIT(16);
+ break;
+ case 12:
+ /*
+ * Calculate offset at runtime in oa_pin_context for gen12 and
+ * cache the value in perf->ctx_oactxctrl_offset.
+ */
+ break;
+ default:
+ MISSING_CASE(GRAPHICS_VER(i915));
+ }
+}
+
/**
* i915_perf_init - initialize i915-perf state on module bind
* @i915: i915 device instance
@@ -4581,6 +4702,7 @@ void i915_perf_init(struct drm_i915_private *i915)
* execlist mode by default.
*/
perf->ops.read = gen8_oa_read;
+ i915_perf_init_info(i915);
if (IS_GRAPHICS_VER(i915, 8, 9)) {
perf->ops.is_valid_b_counter_reg =
@@ -4600,18 +4722,6 @@ void i915_perf_init(struct drm_i915_private *i915)
perf->ops.enable_metric_set = gen8_enable_metric_set;
perf->ops.disable_metric_set = gen8_disable_metric_set;
perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
-
- if (GRAPHICS_VER(i915) == 8) {
- perf->ctx_oactxctrl_offset = 0x120;
- perf->ctx_flexeu0_offset = 0x2ce;
-
- perf->gen8_valid_ctx_bit = BIT(25);
- } else {
- perf->ctx_oactxctrl_offset = 0x128;
- perf->ctx_flexeu0_offset = 0x3de;
-
- perf->gen8_valid_ctx_bit = BIT(16);
- }
} else if (GRAPHICS_VER(i915) == 11) {
perf->ops.is_valid_b_counter_reg =
gen7_is_valid_b_counter_addr;
@@ -4625,11 +4735,6 @@ void i915_perf_init(struct drm_i915_private *i915)
perf->ops.enable_metric_set = gen8_enable_metric_set;
perf->ops.disable_metric_set = gen11_disable_metric_set;
perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
-
- perf->ctx_oactxctrl_offset = 0x124;
- perf->ctx_flexeu0_offset = 0x78e;
-
- perf->gen8_valid_ctx_bit = BIT(16);
} else if (GRAPHICS_VER(i915) == 12) {
perf->ops.is_valid_b_counter_reg =
gen12_is_valid_b_counter_addr;
@@ -4643,9 +4748,6 @@ void i915_perf_init(struct drm_i915_private *i915)
perf->ops.enable_metric_set = gen12_enable_metric_set;
perf->ops.disable_metric_set = gen12_disable_metric_set;
perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
-
- perf->ctx_flexeu0_offset = 0;
- perf->ctx_oactxctrl_offset = 0x144;
}
}
diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
index f31c9f13a9fc..0ef3562ff4aa 100644
--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
@@ -97,7 +97,7 @@
#define GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
#define GEN12_OAR_OACONTROL_COUNTER_ENABLE (1 << 0)
-#define GEN12_OACTXCONTROL _MMIO(0x2360)
+#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
#define GEN12_OAR_OASTATUS _MMIO(0x2968)
/* Gen12 OAG unit */
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 05/16] drm/i915/perf: Enable bytes per clock reporting in OA
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (3 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 06/16] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
` (14 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
per clock reporting. Enable bytes per clock setting on enabling OA.
Bspec: 51762
Bspec: 52201
v2:
- Fix commit msg (Ashutosh)
- Fix checkpatch issues
v3:
- s/commands/bytes/ in code comment and commmit msg
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 +++
drivers/gpu/drm/i915/i915_pci.c | 1 +
drivers/gpu/drm/i915/i915_perf.c | 20 ++++++++++++++++++++
drivers/gpu/drm/i915/i915_perf_oa_regs.h | 4 ++++
drivers/gpu/drm/i915/intel_device_info.h | 1 +
5 files changed, 29 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 90ed8e6db2fe..4de2a20b664b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -896,6 +896,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
#define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm)
#define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc)
+#define HAS_OA_BPC_REPORTING(dev_priv) \
+ (INTEL_INFO(dev_priv)->has_oa_bpc_reporting)
+
/*
* Set this flag, when platform requires 64K GTT page sizes or larger for
* device local memory access.
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 38460a0bd7cb..196ece92845b 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1023,6 +1023,7 @@ static const struct intel_device_info adl_p_info = {
.has_logical_ring_contexts = 1, \
.has_logical_ring_elsq = 1, \
.has_mslice_steering = 1, \
+ .has_oa_bpc_reporting = 1, \
.has_rc6 = 1, \
.has_reset_engine = 1, \
.has_rps = 1, \
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0ba61daedf6d..0f2d531fd831 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2752,10 +2752,12 @@ static int
gen12_enable_metric_set(struct i915_perf_stream *stream,
struct i915_active *active)
{
+ struct drm_i915_private *i915 = stream->perf->i915;
struct intel_uncore *uncore = stream->uncore;
struct i915_oa_config *oa_config = stream->oa_config;
bool periodic = stream->periodic;
u32 period_exponent = stream->period_exponent;
+ u32 sqcnt1;
int ret;
intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
@@ -2774,6 +2776,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
(period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
: 0);
+ /*
+ * Initialize Super Queue Internal Cnt Register
+ * Set PMON Enable in order to collect valid metrics.
+ * Enable byets per clock reporting in OA for XEHPSDV onward.
+ */
+ sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
+ (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
+
+ intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
+
/*
* Update all contexts prior writing the mux configurations as we need
* to make sure all slices/subslices are ON before writing to NOA
@@ -2823,6 +2835,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream)
static void gen12_disable_metric_set(struct i915_perf_stream *stream)
{
struct intel_uncore *uncore = stream->uncore;
+ struct drm_i915_private *i915 = stream->perf->i915;
+ u32 sqcnt1;
/* Reset all contexts' slices/subslices configurations. */
gen12_configure_all_contexts(stream, NULL, NULL);
@@ -2833,6 +2847,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
/* Make sure we disable noa to save power. */
intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
+
+ sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
+ (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
+
+ /* Reset PMON Enable to save power. */
+ intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0);
}
static void gen7_oa_enable(struct i915_perf_stream *stream)
diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
index 0ef3562ff4aa..381d94101610 100644
--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
@@ -134,4 +134,8 @@
#define GDT_CHICKEN_BITS _MMIO(0x9840)
#define GT_NOA_ENABLE 0x00000080
+#define GEN12_SQCNT1 _MMIO(0x8718)
+#define GEN12_SQCNT1_PMON_ENABLE REG_BIT(30)
+#define GEN12_SQCNT1_OABPC REG_BIT(29)
+
#endif /* __INTEL_PERF_OA_REGS__ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index bc87d3156b14..25f7f375a3c4 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -165,6 +165,7 @@ enum intel_ppgtt_type {
func(has_logical_ring_elsq); \
func(has_media_ratio_mode); \
func(has_mslice_steering); \
+ func(has_oa_bpc_reporting); \
func(has_one_eu_per_fuse_bit); \
func(has_pxp); \
func(has_rc6); \
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 06/16] drm/i915/perf: Simply use stream->ctx
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (4 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 05/16] drm/i915/perf: Enable bytes per clock reporting in OA Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 07/16] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
` (13 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Earlier code used exclusive_stream to check for user passed context.
Simplify this by accessing stream->ctx.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0f2d531fd831..595f2f349d24 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -776,7 +776,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
* switches since it's not-uncommon for periodic samples to
* identify a switch before any 'context switch' report.
*/
- if (!stream->perf->exclusive_stream->ctx ||
+ if (!stream->ctx ||
stream->specific_ctx_id == ctx_id ||
stream->oa_buffer.last_ctx_id == stream->specific_ctx_id ||
reason & OAREPORT_REASON_CTX_SWITCH) {
@@ -785,7 +785,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
* While filtering for a single context we avoid
* leaking the IDs of other contexts.
*/
- if (stream->perf->exclusive_stream->ctx &&
+ if (stream->ctx &&
stream->specific_ctx_id != ctx_id) {
report32[2] = INVALID_CTX_ID;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 07/16] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (5 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 06/16] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 08/16] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
` (12 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Make perf part of gt as the OAG buffer is specific to a gt. The refactor
eventually simplifies programming the right OA buffer and the right HW
registers when supporting multiple gts.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/gt/intel_gt_types.h | 3 +
drivers/gpu/drm/i915/gt/intel_sseu.c | 4 +-
drivers/gpu/drm/i915/i915_perf.c | 75 +++++++++++++---------
drivers/gpu/drm/i915/i915_perf_types.h | 39 +++++------
drivers/gpu/drm/i915/selftests/i915_perf.c | 16 +++--
5 files changed, 80 insertions(+), 57 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 30003d68fd51..aa5f162626e2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -20,6 +20,7 @@
#include "intel_gsc.h"
#include "i915_vma.h"
+#include "i915_perf_types.h"
#include "intel_engine_types.h"
#include "intel_gt_buffer_pool_types.h"
#include "intel_hwconfig.h"
@@ -287,6 +288,8 @@ struct intel_gt {
/* sysfs defaults per gt */
struct gt_defaults defaults;
struct kobject *sysfs_defaults;
+
+ struct i915_perf_gt perf;
};
struct intel_gt_definition {
diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c
index 66f21c735d54..6c6198a257ac 100644
--- a/drivers/gpu/drm/i915/gt/intel_sseu.c
+++ b/drivers/gpu/drm/i915/gt/intel_sseu.c
@@ -677,8 +677,8 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt,
* If i915/perf is active, we want a stable powergating configuration
* on the system. Use the configuration pinned by i915/perf.
*/
- if (i915->perf.exclusive_stream)
- req_sseu = &i915->perf.sseu;
+ if (gt->perf.exclusive_stream)
+ req_sseu = >->perf.sseu;
slices = hweight8(req_sseu->slice_mask);
subslices = hweight8(req_sseu->subslice_mask);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 595f2f349d24..b544a5ff8eb2 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1569,8 +1569,9 @@ free_noa_wait(struct i915_perf_stream *stream)
static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
{
struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt = stream->engine->gt;
- if (WARN_ON(stream != perf->exclusive_stream))
+ if (WARN_ON(stream != gt->perf.exclusive_stream))
return;
/*
@@ -1579,7 +1580,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
*
* See i915_oa_init_reg_state() and lrc_configure_all_contexts()
*/
- WRITE_ONCE(perf->exclusive_stream, NULL);
+ WRITE_ONCE(gt->perf.exclusive_stream, NULL);
perf->ops.disable_metric_set(stream);
free_oa_buffer(stream);
@@ -2570,10 +2571,11 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
{
struct drm_i915_private *i915 = stream->perf->i915;
struct intel_engine_cs *engine;
+ struct intel_gt *gt = stream->engine->gt;
struct i915_gem_context *ctx, *cn;
int err;
- lockdep_assert_held(&stream->perf->lock);
+ lockdep_assert_held(>->perf.lock);
/*
* The OA register config is setup through the context image. This image
@@ -3094,6 +3096,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
{
struct drm_i915_private *i915 = stream->perf->i915;
struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt;
int format_size;
int ret;
@@ -3102,6 +3105,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
"OA engine not specified\n");
return -EINVAL;
}
+ gt = props->engine->gt;
/*
* If the sysfs metrics/ directory wasn't registered for some
@@ -3132,7 +3136,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
* counter reports and marshal to the appropriate client
* we currently only allow exclusive access
*/
- if (perf->exclusive_stream) {
+ if (gt->perf.exclusive_stream) {
drm_dbg(&stream->perf->i915->drm,
"OA unit already in use\n");
return -EBUSY;
@@ -3212,8 +3216,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
stream->ops = &i915_oa_stream_ops;
- perf->sseu = props->sseu;
- WRITE_ONCE(perf->exclusive_stream, stream);
+ stream->engine->gt->perf.sseu = props->sseu;
+ WRITE_ONCE(gt->perf.exclusive_stream, stream);
ret = i915_perf_stream_enable_sync(stream);
if (ret) {
@@ -3235,7 +3239,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
return 0;
err_enable:
- WRITE_ONCE(perf->exclusive_stream, NULL);
+ WRITE_ONCE(gt->perf.exclusive_stream, NULL);
perf->ops.disable_metric_set(stream);
free_oa_buffer(stream);
@@ -3265,7 +3269,7 @@ void i915_oa_init_reg_state(const struct intel_context *ce,
return;
/* perf.exclusive_stream serialised by lrc_configure_all_contexts() */
- stream = READ_ONCE(engine->i915->perf.exclusive_stream);
+ stream = READ_ONCE(engine->gt->perf.exclusive_stream);
if (stream && GRAPHICS_VER(stream->perf->i915) < 12)
gen8_update_reg_state_unlocked(ce, stream);
}
@@ -3294,7 +3298,7 @@ static ssize_t i915_perf_read(struct file *file,
loff_t *ppos)
{
struct i915_perf_stream *stream = file->private_data;
- struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt = stream->engine->gt;
size_t offset = 0;
int ret;
@@ -3318,14 +3322,14 @@ static ssize_t i915_perf_read(struct file *file,
if (ret)
return ret;
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
ret = stream->ops->read(stream, buf, count, &offset);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
} while (!offset && !ret);
} else {
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
ret = stream->ops->read(stream, buf, count, &offset);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
}
/* We allow the poll checking to sometimes report false positive EPOLLIN
@@ -3372,7 +3376,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
* &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
* will be woken for new stream data.
*
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The >->perf.lock mutex has been taken to serialize
* with any non-file-operation driver hooks.
*
* Returns: any poll events that are ready without sleeping
@@ -3413,12 +3417,12 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
{
struct i915_perf_stream *stream = file->private_data;
- struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt = stream->engine->gt;
__poll_t ret;
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
ret = i915_perf_poll_locked(stream, file, wait);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
return ret;
}
@@ -3517,7 +3521,7 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
* @cmd: the ioctl request
* @arg: the ioctl data
*
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The >->perf.lock mutex has been taken to serialize
* with any non-file-operation driver hooks.
*
* Returns: zero on success or a negative error code. Returns -EINVAL for
@@ -3557,12 +3561,12 @@ static long i915_perf_ioctl(struct file *file,
unsigned long arg)
{
struct i915_perf_stream *stream = file->private_data;
- struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt = stream->engine->gt;
long ret;
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
ret = i915_perf_ioctl_locked(stream, cmd, arg);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
return ret;
}
@@ -3574,7 +3578,7 @@ static long i915_perf_ioctl(struct file *file,
* Frees all resources associated with the given i915 perf @stream, disabling
* any associated data capture in the process.
*
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The >->perf.lock mutex has been taken to serialize
* with any non-file-operation driver hooks.
*/
static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
@@ -3606,10 +3610,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
{
struct i915_perf_stream *stream = file->private_data;
struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt = stream->engine->gt;
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
i915_perf_destroy_locked(stream);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
/* Release the reference the perf stream kept on the driver. */
drm_dev_put(&perf->i915->drm);
@@ -3642,7 +3647,7 @@ static const struct file_operations fops = {
* See i915_perf_ioctl_open() for interface details.
*
* Implements further stream config validation and stream initialization on
- * behalf of i915_perf_open_ioctl() with the &perf->lock mutex
+ * behalf of i915_perf_open_ioctl() with the >->perf.lock mutex
* taken to serialize with any non-file-operation driver hooks.
*
* Note: at this point the @props have only been validated in isolation and
@@ -4026,7 +4031,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
* mutex to avoid an awkward lockdep with mmap_lock.
*
* Most of the implementation details are handled by
- * i915_perf_open_ioctl_locked() after taking the &perf->lock
+ * i915_perf_open_ioctl_locked() after taking the >->perf.lock
* mutex for serializing with any non-file-operation driver hooks.
*
* Return: A newly opened i915 Perf stream file descriptor or negative
@@ -4037,6 +4042,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
{
struct i915_perf *perf = &to_i915(dev)->perf;
struct drm_i915_perf_open_param *param = data;
+ struct intel_gt *gt;
struct perf_open_properties props;
u32 known_open_flags;
int ret;
@@ -4063,9 +4069,11 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
if (ret)
return ret;
- mutex_lock(&perf->lock);
+ gt = props.engine->gt;
+
+ mutex_lock(>->perf.lock);
ret = i915_perf_open_ioctl_locked(perf, param, &props, file);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
return ret;
}
@@ -4081,6 +4089,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
void i915_perf_register(struct drm_i915_private *i915)
{
struct i915_perf *perf = &i915->perf;
+ struct intel_gt *gt = to_gt(i915);
if (!perf->i915)
return;
@@ -4089,13 +4098,13 @@ void i915_perf_register(struct drm_i915_private *i915)
* i915_perf_open_ioctl(); considering that we register after
* being exposed to userspace.
*/
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
perf->metrics_kobj =
kobject_create_and_add("metrics",
&i915->drm.primary->kdev->kobj);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
}
/**
@@ -4772,7 +4781,11 @@ void i915_perf_init(struct drm_i915_private *i915)
}
if (perf->ops.enable_metric_set) {
- mutex_init(&perf->lock);
+ struct intel_gt *gt;
+ int i;
+
+ for_each_gt(gt, i915, i)
+ mutex_init(>->perf.lock);
/* Choose a representative limit */
oa_sample_rate_hard_limit = to_gt(i915)->clock_frequency / 2;
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index 05cb9a335a97..e888bfab478f 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -380,6 +380,26 @@ struct i915_oa_ops {
u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream);
};
+struct i915_perf_gt {
+ /*
+ * Lock associated with anything below within this structure.
+ */
+ struct mutex lock;
+
+ /**
+ * @sseu: sseu configuration selected to run while perf is active,
+ * applies to all contexts.
+ */
+ struct intel_sseu sseu;
+
+ /*
+ * @exclusive_stream: The stream currently using the OA unit. This is
+ * sometimes accessed outside a syscall associated to its file
+ * descriptor.
+ */
+ struct i915_perf_stream *exclusive_stream;
+};
+
struct i915_perf {
struct drm_i915_private *i915;
@@ -397,25 +417,6 @@ struct i915_perf {
*/
struct idr metrics_idr;
- /*
- * Lock associated with anything below within this structure
- * except exclusive_stream.
- */
- struct mutex lock;
-
- /*
- * The stream currently using the OA unit. If accessed
- * outside a syscall associated to its file
- * descriptor.
- */
- struct i915_perf_stream *exclusive_stream;
-
- /**
- * @sseu: sseu configuration selected to run while perf is active,
- * applies to all contexts.
- */
- struct intel_sseu sseu;
-
/**
* For rate limiting any notifications of spurious
* invalid OA reports
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
index 429c6d73b159..24dde5531423 100644
--- a/drivers/gpu/drm/i915/selftests/i915_perf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
@@ -102,6 +102,12 @@ test_stream(struct i915_perf *perf)
I915_OA_FORMAT_A32u40_A4u32_B8_C8 : I915_OA_FORMAT_C4_B8,
};
struct i915_perf_stream *stream;
+ struct intel_gt *gt;
+
+ if (!props.engine)
+ return NULL;
+
+ gt = props.engine->gt;
if (!oa_config)
return NULL;
@@ -116,12 +122,12 @@ test_stream(struct i915_perf *perf)
stream->perf = perf;
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
if (i915_oa_stream_init(stream, ¶m, &props)) {
kfree(stream);
stream = NULL;
}
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
i915_oa_config_put(oa_config);
@@ -130,11 +136,11 @@ test_stream(struct i915_perf *perf)
static void stream_destroy(struct i915_perf_stream *stream)
{
- struct i915_perf *perf = stream->perf;
+ struct intel_gt *gt = stream->engine->gt;
- mutex_lock(&perf->lock);
+ mutex_lock(>->perf.lock);
i915_perf_destroy_locked(stream);
- mutex_unlock(&perf->lock);
+ mutex_unlock(>->perf.lock);
}
static int live_sanitycheck(void *arg)
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 08/16] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (6 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 07/16] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 09/16] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
` (11 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
With multi-gt, user can access multiple OA buffers concurrently. Use
stream->lock instead of gt->perf.lock to serialize file operations.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 31 ++++++++++++--------------
drivers/gpu/drm/i915/i915_perf_types.h | 5 +++++
2 files changed, 19 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b544a5ff8eb2..ac7579cb3586 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3235,6 +3235,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
stream->poll_check_timer.function = oa_poll_check_timer_cb;
init_waitqueue_head(&stream->poll_wq);
spin_lock_init(&stream->oa_buffer.ptr_lock);
+ mutex_init(&stream->lock);
return 0;
@@ -3298,7 +3299,6 @@ static ssize_t i915_perf_read(struct file *file,
loff_t *ppos)
{
struct i915_perf_stream *stream = file->private_data;
- struct intel_gt *gt = stream->engine->gt;
size_t offset = 0;
int ret;
@@ -3322,14 +3322,14 @@ static ssize_t i915_perf_read(struct file *file,
if (ret)
return ret;
- mutex_lock(>->perf.lock);
+ mutex_lock(&stream->lock);
ret = stream->ops->read(stream, buf, count, &offset);
- mutex_unlock(>->perf.lock);
+ mutex_unlock(&stream->lock);
} while (!offset && !ret);
} else {
- mutex_lock(>->perf.lock);
+ mutex_lock(&stream->lock);
ret = stream->ops->read(stream, buf, count, &offset);
- mutex_unlock(>->perf.lock);
+ mutex_unlock(&stream->lock);
}
/* We allow the poll checking to sometimes report false positive EPOLLIN
@@ -3376,9 +3376,6 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
* &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
* will be woken for new stream data.
*
- * Note: The >->perf.lock mutex has been taken to serialize
- * with any non-file-operation driver hooks.
- *
* Returns: any poll events that are ready without sleeping
*/
static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
@@ -3417,12 +3414,11 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
{
struct i915_perf_stream *stream = file->private_data;
- struct intel_gt *gt = stream->engine->gt;
__poll_t ret;
- mutex_lock(>->perf.lock);
+ mutex_lock(&stream->lock);
ret = i915_perf_poll_locked(stream, file, wait);
- mutex_unlock(>->perf.lock);
+ mutex_unlock(&stream->lock);
return ret;
}
@@ -3521,9 +3517,6 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
* @cmd: the ioctl request
* @arg: the ioctl data
*
- * Note: The >->perf.lock mutex has been taken to serialize
- * with any non-file-operation driver hooks.
- *
* Returns: zero on success or a negative error code. Returns -EINVAL for
* an unknown ioctl request.
*/
@@ -3561,12 +3554,11 @@ static long i915_perf_ioctl(struct file *file,
unsigned long arg)
{
struct i915_perf_stream *stream = file->private_data;
- struct intel_gt *gt = stream->engine->gt;
long ret;
- mutex_lock(>->perf.lock);
+ mutex_lock(&stream->lock);
ret = i915_perf_ioctl_locked(stream, cmd, arg);
- mutex_unlock(>->perf.lock);
+ mutex_unlock(&stream->lock);
return ret;
}
@@ -3612,6 +3604,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
struct i915_perf *perf = stream->perf;
struct intel_gt *gt = stream->engine->gt;
+ /*
+ * Within this call, we know that the fd is being closed and we have no
+ * other user of stream->lock. Use the perf lock to destroy the stream
+ * here.
+ */
mutex_lock(>->perf.lock);
i915_perf_destroy_locked(stream);
mutex_unlock(>->perf.lock);
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index e888bfab478f..dc9bfd8086cf 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -146,6 +146,11 @@ struct i915_perf_stream {
*/
struct intel_engine_cs *engine;
+ /*
+ * Lock associated with operations on stream
+ */
+ struct mutex lock;
+
/**
* @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
* properties given when opening a stream, representing the contents
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 09/16] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (7 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 08/16] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 10/16] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
` (10 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
User passes uabi engine class and instance to the perf OA interface. Use
gt corresponding to the engine to pin the buffers to the right ggtt.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ac7579cb3586..e7d36821ab21 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1758,6 +1758,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
static int alloc_oa_buffer(struct i915_perf_stream *stream)
{
struct drm_i915_private *i915 = stream->perf->i915;
+ struct intel_gt *gt = stream->engine->gt;
struct drm_i915_gem_object *bo;
struct i915_vma *vma;
int ret;
@@ -1777,11 +1778,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
/* PreHSW required 512K alignment, HSW requires 16M */
- vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
+ vma = i915_vma_instance(bo, >->ggtt->vm, NULL);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
goto err_unref;
}
+
+ /*
+ * PreHSW required 512K alignment.
+ * HSW and onwards, align to requested size of OA buffer.
+ */
+ ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH);
+ if (ret) {
+ drm_err(>->i915->drm, "Failed to pin OA buffer %d\n", ret);
+ goto err_unref;
+ }
+
stream->oa_buffer.vma = vma;
stream->oa_buffer.vaddr =
@@ -1831,6 +1843,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
static int alloc_noa_wait(struct i915_perf_stream *stream)
{
struct drm_i915_private *i915 = stream->perf->i915;
+ struct intel_gt *gt = stream->engine->gt;
struct drm_i915_gem_object *bo;
struct i915_vma *vma;
const u64 delay_ticks = 0xffffffffffffffff -
@@ -1871,12 +1884,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
* multiple OA config BOs will have a jump to this address and it
* needs to be fixed during the lifetime of the i915/perf stream.
*/
- vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH);
+ vma = i915_vma_instance(bo, >->ggtt->vm, NULL);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
goto out_ww;
}
+ ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
+ if (ret)
+ goto out_ww;
+
batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
if (IS_ERR(batch)) {
ret = PTR_ERR(batch);
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 10/16] drm/i915/perf: Store a pointer to oa_format in oa_buffer
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (8 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 09/16] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 11/16] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
` (9 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
DG2 introduces OA reports with 64 bit report header fields. Perf OA
would need more information about the OA format in order to process such
reports. Store all OA format info in oa_buffer instead of just the size
and format-id.
v2: Drop format_size variable (Ashutosh)
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 30 +++++++++++---------------
drivers/gpu/drm/i915/i915_perf_types.h | 3 +--
2 files changed, 13 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e7d36821ab21..fdf4088eca7b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -465,7 +465,7 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
{
u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
- int report_size = stream->oa_buffer.format_size;
+ int report_size = stream->oa_buffer.format->size;
unsigned long flags;
bool pollin;
u32 hw_tail;
@@ -602,7 +602,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
size_t *offset,
const u8 *report)
{
- int report_size = stream->oa_buffer.format_size;
+ int report_size = stream->oa_buffer.format->size;
struct drm_i915_perf_record_header header;
header.type = DRM_I915_PERF_RECORD_SAMPLE;
@@ -652,7 +652,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
size_t *offset)
{
struct intel_uncore *uncore = stream->uncore;
- int report_size = stream->oa_buffer.format_size;
+ int report_size = stream->oa_buffer.format->size;
u8 *oa_buf_base = stream->oa_buffer.vaddr;
u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
u32 mask = (OA_BUFFER_SIZE - 1);
@@ -945,7 +945,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
size_t *offset)
{
struct intel_uncore *uncore = stream->uncore;
- int report_size = stream->oa_buffer.format_size;
+ int report_size = stream->oa_buffer.format->size;
u8 *oa_buf_base = stream->oa_buffer.vaddr;
u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
u32 mask = (OA_BUFFER_SIZE - 1);
@@ -2510,7 +2510,7 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
{
int err;
struct intel_context *ce = stream->pinned_ctx;
- u32 format = stream->oa_buffer.format;
+ u32 format = stream->oa_buffer.format->format;
u32 offset = stream->perf->ctx_oactxctrl_offset;
struct flex regs_context[] = {
{
@@ -2881,7 +2881,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
u32 ctx_id = stream->specific_ctx_id;
bool periodic = stream->periodic;
u32 period_exponent = stream->period_exponent;
- u32 report_format = stream->oa_buffer.format;
+ u32 report_format = stream->oa_buffer.format->format;
/*
* Reset buf pointers so we don't forward reports from before now.
@@ -2907,7 +2907,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
static void gen8_oa_enable(struct i915_perf_stream *stream)
{
struct intel_uncore *uncore = stream->uncore;
- u32 report_format = stream->oa_buffer.format;
+ u32 report_format = stream->oa_buffer.format->format;
/*
* Reset buf pointers so we don't forward reports from before now.
@@ -2933,7 +2933,7 @@ static void gen8_oa_enable(struct i915_perf_stream *stream)
static void gen12_oa_enable(struct i915_perf_stream *stream)
{
struct intel_uncore *uncore = stream->uncore;
- u32 report_format = stream->oa_buffer.format;
+ u32 report_format = stream->oa_buffer.format->format;
/*
* If we don't want OA reports from the OA buffer, then we don't even
@@ -3114,7 +3114,6 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
struct drm_i915_private *i915 = stream->perf->i915;
struct i915_perf *perf = stream->perf;
struct intel_gt *gt;
- int format_size;
int ret;
if (!props->engine) {
@@ -3170,20 +3169,15 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
stream->sample_size = sizeof(struct drm_i915_perf_record_header);
- format_size = perf->oa_formats[props->oa_format].size;
+ stream->oa_buffer.format = &perf->oa_formats[props->oa_format];
+ if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0))
+ return -EINVAL;
stream->sample_flags = props->sample_flags;
- stream->sample_size += format_size;
-
- stream->oa_buffer.format_size = format_size;
- if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0))
- return -EINVAL;
+ stream->sample_size += stream->oa_buffer.format->size;
stream->hold_preemption = props->hold_preemption;
- stream->oa_buffer.format =
- perf->oa_formats[props->oa_format].format;
-
stream->periodic = props->oa_periodic;
if (stream->periodic)
stream->period_exponent = props->oa_period_exponent;
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index dc9bfd8086cf..e0c96b44eda8 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -250,11 +250,10 @@ struct i915_perf_stream {
* @oa_buffer: State of the OA buffer.
*/
struct {
+ const struct i915_oa_format *format;
struct i915_vma *vma;
u8 *vaddr;
u32 last_ctx_id;
- int format;
- int format_size;
int size_exponent;
/**
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 11/16] drm/i915/perf: Add Wa_1508761755:dg2
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (9 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 10/16] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 12/16] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
` (8 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
Disable Clock gating in EU when gathering the events so that EU events
are not lost.
v2: Fix checkpatch issues
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
drivers/gpu/drm/i915/i915_perf.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 7f79bbf97828..7be0327ca8dc 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1140,6 +1140,7 @@
#define GEN12_DISABLE_EARLY_READ REG_BIT(14)
#define GEN12_ENABLE_LARGE_GRF_MODE REG_BIT(12)
#define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8)
+#define GEN12_DISABLE_DOP_GATING REG_BIT(0)
#define RT_CTRL _MMIO(0xe530)
#define DIS_NULL_QUERY REG_BIT(10)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index fdf4088eca7b..b725f204f496 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2779,6 +2779,18 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
u32 sqcnt1;
int ret;
+ /*
+ * Wa_1508761755:xehpsdv, dg2
+ * EU NOA signals behave incorrectly if EU clock gating is enabled.
+ * Disable thread stall DOP gating and EU DOP gating.
+ */
+ if (IS_XEHPSDV(i915) || IS_DG2(i915)) {
+ intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+ _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
+ intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+ _MASKED_BIT_ENABLE(GEN12_DISABLE_DOP_GATING));
+ }
+
intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
/* Disable clk ratio reports, like previous Gens. */
_MASKED_BIT_ENABLE(GEN12_OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS |
@@ -2857,6 +2869,17 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
struct drm_i915_private *i915 = stream->perf->i915;
u32 sqcnt1;
+ /*
+ * Wa_1508761755:xehpsdv, dg2
+ * Enable thread stall DOP gating and EU DOP gating.
+ */
+ if (IS_XEHPSDV(i915) || IS_DG2(i915)) {
+ intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+ _MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE));
+ intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+ _MASKED_BIT_DISABLE(GEN12_DISABLE_DOP_GATING));
+ }
+
/* Reset all contexts' slices/subslices configurations. */
gen12_configure_all_contexts(stream, NULL, NULL);
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 12/16] drm/i915/perf: Apply Wa_18013179988
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (10 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 11/16] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 13/16] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
` (7 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
OA reports in the OA buffer contain an OA timestamp field that helps
user calculate delta between 2 OA reports. The calculation relies on the
CS timestamp frequency to convert the timestamp value to nanoseconds.
The CS timestamp frequency is a function of the CTC_SHIFT value in
RPM_CONFIG0.
In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
actual value from RPM_CONFIG0. At the user level, this results in an
error in calculating delta between 2 OA reports since the OA timestamp
is not shifted in the same manner as CS timestamp. Also the periodicity
of the reports is different from what the user configured because of
mismatch in the CS and OA frequencies.
The issue also affects MI_REPORT_PERF_COUNT command.
To resolve this, return actual OA timestamp frequency to the user in
i915_getparam_ioctl, so that user can calculate the right OA exponent as
well as interpret the reports correctly.
MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893
v2:
- Use REG_FIELD_GET (Ashutosh)
- Update commit msg
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/i915_getparam.c | 3 +++
drivers/gpu/drm/i915/i915_perf.c | 30 ++++++++++++++++++++++++++--
drivers/gpu/drm/i915/i915_perf.h | 2 ++
include/uapi/drm/i915_drm.h | 6 ++++++
4 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 342c8ca6414e..3047e80e1163 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
case I915_PARAM_PERF_REVISION:
value = i915_perf_ioctl_version();
break;
+ case I915_PARAM_OA_TIMESTAMP_FREQUENCY:
+ value = i915_perf_oa_timestamp_frequency(i915);
+ break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b725f204f496..d625962c4c31 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3112,6 +3112,30 @@ get_sseu_config(struct intel_sseu *out_sseu,
return i915_gem_user_to_context_sseu(engine->gt, drm_sseu, out_sseu);
}
+/*
+ * OA timestamp frequency = CS timestamp frequency in most platforms. On some
+ * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
+ * cases, return the adjusted CS timestamp frequency to the user.
+ */
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
+{
+ /* Wa_18013179988:dg2 */
+ if (IS_DG2(i915)) {
+ intel_wakeref_t wakeref;
+ u32 reg, shift;
+
+ with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
+ reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
+
+ shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK,
+ reg);
+
+ return to_gt(i915)->clock_frequency << (3 - shift);
+ }
+
+ return to_gt(i915)->clock_frequency;
+}
+
/**
* i915_oa_stream_init - validate combined props for OA stream and init
* @stream: An i915 perf stream
@@ -3833,8 +3857,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
{
- return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
- 2ULL << exponent);
+ u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
+ u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
+
+ return div_u64(nom + den - 1, den);
}
static __always_inline bool
diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h
index 1d1329e5af3a..f96e09a4af04 100644
--- a/drivers/gpu/drm/i915/i915_perf.h
+++ b/drivers/gpu/drm/i915/i915_perf.h
@@ -57,4 +57,6 @@ static inline void i915_oa_config_put(struct i915_oa_config *oa_config)
kref_put(&oa_config->ref, i915_oa_config_release);
}
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915);
+
#endif /* __I915_PERF_H__ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7cad631040d3..17dab5a0f35d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -765,6 +765,12 @@ typedef struct drm_i915_irq_wait {
/* Query if the kernel supports the I915_USERPTR_PROBE flag. */
#define I915_PARAM_HAS_USERPTR_PROBE 56
+/*
+ * Frequency of the timestamps in OA reports. This used to be the same as the CS
+ * timestamp frequency, but differs on some platforms.
+ */
+#define I915_PARAM_OA_TIMESTAMP_FREQUENCY 57
+
/* Must be kept compact -- no holes and well documented */
/**
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 13/16] drm/i915/perf: Save/restore EU flex counters across reset
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (11 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 12/16] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 14/16] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
` (6 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
If a drm client is killed, then hw contexts used by the client are reset
immediately. This reset clears the EU flex counter configuration. If an
OA use case is running in parallel, it would start seeing zeroed eu
counter values following the reset even if the drm client is restarted.
Save/restore the EU flex counter config so that the EU counters can be
monitored continuously across resets.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 657f0beb8e06..6d9996e24cce 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -376,6 +376,14 @@ static int guc_mmio_regset_init(struct temp_regset *regset,
for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
ret |= GUC_MMIO_REG_ADD(gt, regset, GEN9_LNCFCMOCS(i), false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL0, false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL1, false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL2, false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL3, false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL4, false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL5, false);
+ ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL6, false);
+
return ret ? -1 : 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 14/16] drm/i915/guc: Support OA when Wa_16011777198 is enabled
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (12 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 13/16] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 15/16] drm/i915/perf: complete programming whitelisting for XEHPSDV Umesh Nerlige Ramappa
` (5 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
On DG2, a w/a resets RCS/CCS before it goes into RC6. This breaks OA
since OA does not expect engine resets during its use. Fix it by
disabling RC6.
v2: (Ashutosh)
- Bring back slpc_unset_param helper
- Update commit msg
- Use with_intel_runtime_pm helper for set/unset
v3: (Ashutosh)
- Just use intel_uc_uses_guc_rc
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
.../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 9 +++
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 66 +++++++++++++++++++
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h | 2 +
drivers/gpu/drm/i915/i915_perf.c | 27 ++++++++
4 files changed, 104 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
index 4c840a2639dc..811add10c30d 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
@@ -128,6 +128,15 @@ enum slpc_media_ratio_mode {
SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2,
};
+enum slpc_gucrc_mode {
+ SLPC_GUCRC_MODE_HW = 0,
+ SLPC_GUCRC_MODE_GUCRC_NO_RC6 = 1,
+ SLPC_GUCRC_MODE_GUCRC_STATIC_TIMEOUT = 2,
+ SLPC_GUCRC_MODE_GUCRC_DYNAMIC_HYSTERESIS = 3,
+
+ SLPC_GUCRC_MODE_MAX,
+};
+
enum slpc_event_id {
SLPC_EVENT_RESET = 0,
SLPC_EVENT_SHUTDOWN = 1,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
index fdd895f73f9f..b3a4fb9e021f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -137,6 +137,17 @@ static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, u32 value)
return ret > 0 ? -EPROTO : ret;
}
+static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id)
+{
+ u32 request[] = {
+ GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
+ SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1),
+ id,
+ };
+
+ return intel_guc_send(guc, request, ARRAY_SIZE(request));
+}
+
static bool slpc_is_running(struct intel_guc_slpc *slpc)
{
return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING;
@@ -190,6 +201,15 @@ static int slpc_set_param(struct intel_guc_slpc *slpc, u8 id, u32 value)
return ret;
}
+static int slpc_unset_param(struct intel_guc_slpc *slpc, u8 id)
+{
+ struct intel_guc *guc = slpc_to_guc(slpc);
+
+ GEM_BUG_ON(id >= SLPC_MAX_PARAM);
+
+ return guc_action_slpc_unset_param(guc, id);
+}
+
static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
{
struct drm_i915_private *i915 = slpc_to_i915(slpc);
@@ -610,6 +630,52 @@ static void slpc_get_rp_values(struct intel_guc_slpc *slpc)
slpc->boost_freq = slpc->rp0_freq;
}
+/**
+ * intel_guc_slpc_override_gucrc_mode() - override GUCRC mode
+ * @slpc: pointer to intel_guc_slpc.
+ * @mode: new value of the mode.
+ *
+ * This function will override the GUCRC mode.
+ *
+ * Return: 0 on success, non-zero error code on failure.
+ */
+int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode)
+{
+ int ret;
+ struct drm_i915_private *i915 = slpc_to_i915(slpc);
+ intel_wakeref_t wakeref;
+
+ if (mode >= SLPC_GUCRC_MODE_MAX)
+ return -EINVAL;
+
+ with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
+ ret = slpc_set_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE, mode);
+ if (ret)
+ drm_err(&i915->drm,
+ "Override gucrc mode %d failed %d\n",
+ mode, ret);
+ }
+
+ return ret;
+}
+
+int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc)
+{
+ struct drm_i915_private *i915 = slpc_to_i915(slpc);
+ intel_wakeref_t wakeref;
+ int ret = 0;
+
+ with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
+ ret = slpc_unset_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE);
+ if (ret)
+ drm_err(&i915->drm,
+ "Unsetting gucrc mode failed %d\n",
+ ret);
+ }
+
+ return ret;
+}
+
/*
* intel_guc_slpc_enable() - Start SLPC
* @slpc: pointer to intel_guc_slpc.
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
index 82a98f78f96c..ccf483730d9d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
@@ -42,5 +42,7 @@ int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val);
void intel_guc_pm_intrmsk_enable(struct intel_gt *gt);
void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
+int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc);
+int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode);
#endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d625962c4c31..4bec533a5b30 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -208,6 +208,7 @@
#include "gt/intel_lrc.h"
#include "gt/intel_lrc_reg.h"
#include "gt/intel_ring.h"
+#include "gt/uc/intel_guc_slpc.h"
#include "i915_drv.h"
#include "i915_file_private.h"
@@ -1585,6 +1586,15 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
free_oa_buffer(stream);
+ /*
+ * Wa_16011777198:dg2: Unset the override of GUCRC mode to enable rc6.
+ */
+ if (intel_uc_uses_guc_rc(>->uc) &&
+ (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
+ IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)))
+ drm_WARN_ON(>->i915->drm,
+ intel_guc_slpc_unset_gucrc_mode(>->uc.guc.slpc));
+
intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL);
intel_engine_pm_put(stream->engine);
@@ -3268,6 +3278,23 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
intel_engine_pm_get(stream->engine);
intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
+ /*
+ * Wa_16011777198:dg2: GuC resets render as part of the Wa. This causes
+ * OA to lose the configuration state. Prevent this by overriding GUCRC
+ * mode.
+ */
+ if (intel_uc_uses_guc_rc(>->uc) &&
+ (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
+ IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
+ ret = intel_guc_slpc_override_gucrc_mode(>->uc.guc.slpc,
+ SLPC_GUCRC_MODE_GUCRC_NO_RC6);
+ if (ret) {
+ drm_dbg(&stream->perf->i915->drm,
+ "Unable to override gucrc mode\n");
+ goto err_config;
+ }
+ }
+
ret = alloc_oa_buffer(stream);
if (ret)
goto err_oa_buf_alloc;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 15/16] drm/i915/perf: complete programming whitelisting for XEHPSDV
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (13 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 14/16] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 16/16] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
` (4 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We have an additional register to select which slices contribute to
OAG/OAG counter increments.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 ++
drivers/gpu/drm/i915/i915_pci.c | 1 +
drivers/gpu/drm/i915/i915_perf.c | 13 +++++++++++++
drivers/gpu/drm/i915/intel_device_info.h | 1 +
4 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4de2a20b664b..15332a22481b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -898,6 +898,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
#define HAS_OA_BPC_REPORTING(dev_priv) \
(INTEL_INFO(dev_priv)->has_oa_bpc_reporting)
+#define HAS_OA_SLICE_CONTRIB_LIMITS(dev_priv) \
+ (INTEL_INFO(dev_priv)->has_oa_slice_contrib_limits)
/*
* Set this flag, when platform requires 64K GTT page sizes or larger for
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 196ece92845b..deb7fccf111a 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1024,6 +1024,7 @@ static const struct intel_device_info adl_p_info = {
.has_logical_ring_elsq = 1, \
.has_mslice_steering = 1, \
.has_oa_bpc_reporting = 1, \
+ .has_oa_slice_contrib_limits = 1, \
.has_rc6 = 1, \
.has_reset_engine = 1, \
.has_rps = 1, \
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4bec533a5b30..abfed2a98a8b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4264,6 +4264,11 @@ static const struct i915_range gen12_oa_b_counters[] = {
{}
};
+static const struct i915_range xehp_oa_b_counters[] = {
+ { .start = 0xdc48, .end = 0xdc48 }, /* OAA_ENABLE_REG */
+ { .start = 0xdd00, .end = 0xdd48 }, /* OAG_LCE0_0 - OAA_LENABLE_REG */
+};
+
static const struct i915_range gen7_oa_mux_regs[] = {
{ .start = 0x91b8, .end = 0x91cc }, /* OA_PERFCNT[1-2], OA_PERFMATRIX */
{ .start = 0x9800, .end = 0x9888 }, /* MICRO_BP0_0 - NOA_WRITE */
@@ -4338,6 +4343,12 @@ static bool gen12_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr)
return reg_in_range_table(addr, gen12_oa_b_counters);
}
+static bool xehp_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr)
+{
+ return reg_in_range_table(addr, xehp_oa_b_counters) ||
+ reg_in_range_table(addr, gen12_oa_b_counters);
+}
+
static bool gen12_is_valid_mux_addr(struct i915_perf *perf, u32 addr)
{
return reg_in_range_table(addr, gen12_oa_mux_regs);
@@ -4850,6 +4861,8 @@ void i915_perf_init(struct drm_i915_private *i915)
perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
} else if (GRAPHICS_VER(i915) == 12) {
perf->ops.is_valid_b_counter_reg =
+ HAS_OA_SLICE_CONTRIB_LIMITS(i915) ?
+ xehp_is_valid_b_counter_addr :
gen12_is_valid_b_counter_addr;
perf->ops.is_valid_mux_reg =
gen12_is_valid_mux_addr;
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 25f7f375a3c4..7b902ef5451e 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -166,6 +166,7 @@ enum intel_ppgtt_type {
func(has_media_ratio_mode); \
func(has_mslice_steering); \
func(has_oa_bpc_reporting); \
+ func(has_oa_slice_contrib_limits); \
func(has_one_eu_per_fuse_bit); \
func(has_pxp); \
func(has_rc6); \
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] [PATCH v4 16/16] drm/i915/perf: Enable OA for DG2
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (14 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 15/16] drm/i915/perf: complete programming whitelisting for XEHPSDV Umesh Nerlige Ramappa
@ 2022-10-12 22:27 ` Umesh Nerlige Ramappa
2022-10-12 22:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev6) Patchwork
` (3 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-12 22:27 UTC (permalink / raw)
To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
OA was disabled for DG2 as support was missing. Enable it back now.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index abfed2a98a8b..3251c2a31b4c 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4801,12 +4801,6 @@ void i915_perf_init(struct drm_i915_private *i915)
{
struct i915_perf *perf = &i915->perf;
- /* XXX const struct i915_perf_ops! */
-
- /* i915_perf is not enabled for DG2 yet */
- if (IS_DG2(i915))
- return;
-
perf->oa_formats = oa_formats;
if (IS_HASWELL(i915)) {
perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev6)
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (15 preceding siblings ...)
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 16/16] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
@ 2022-10-12 22:54 ` Patchwork
2022-10-12 22:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
` (2 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2022-10-12 22:54 UTC (permalink / raw)
To: Umesh Nerlige Ramappa; +Cc: intel-gfx
== Series Details ==
Series: Add DG2 OA support (rev6)
URL : https://patchwork.freedesktop.org/series/107584/
State : warning
== Summary ==
Error: dim checkpatch failed
417244c8ec48 drm/i915/perf: Fix OA filtering logic for GuC mode
09d47558b4c6 drm/i915/perf: Add 32-bit OAG and OAR formats for DG2
9480d2f8e988 drm/i915/perf: Fix noa wait predication for DG2
c2feb391c29e drm/i915/perf: Determine gen12 oa ctx offset at runtime
-:90: WARNING:REPEATED_WORD: Possible repeated word: 'the'
#90: FILE: drivers/gpu/drm/i915/i915_perf.c:1399:
+ * Note that the the reg value is written to 'offset + 1' location, so
total: 0 errors, 1 warnings, 0 checks, 241 lines checked
8ac3c122a123 drm/i915/perf: Enable bytes per clock reporting in OA
06207328a44b drm/i915/perf: Simply use stream->ctx
9188e237c461 drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
9adfddade265 drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
2c0312c26728 drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
0db017205809 drm/i915/perf: Store a pointer to oa_format in oa_buffer
7ae385f5b365 drm/i915/perf: Add Wa_1508761755:dg2
e67ffa844773 drm/i915/perf: Apply Wa_18013179988
b6dae18b26f2 drm/i915/perf: Save/restore EU flex counters across reset
57e06b6e1a28 drm/i915/guc: Support OA when Wa_16011777198 is enabled
e83c47177a64 drm/i915/perf: complete programming whitelisting for XEHPSDV
a22efa535d14 drm/i915/perf: Enable OA for DG2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add DG2 OA support (rev6)
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (16 preceding siblings ...)
2022-10-12 22:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev6) Patchwork
@ 2022-10-12 22:54 ` Patchwork
2022-10-12 23:03 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-10-17 22:05 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add DG2 OA support (rev7) Patchwork
19 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2022-10-12 22:54 UTC (permalink / raw)
To: Umesh Nerlige Ramappa; +Cc: intel-gfx
== Series Details ==
Series: Add DG2 OA support (rev6)
URL : https://patchwork.freedesktop.org/series/107584/
State : warning
== Summary ==
Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BAT: failure for Add DG2 OA support (rev6)
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (17 preceding siblings ...)
2022-10-12 22:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2022-10-12 23:03 ` Patchwork
2022-10-17 22:05 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add DG2 OA support (rev7) Patchwork
19 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2022-10-12 23:03 UTC (permalink / raw)
To: Umesh Nerlige Ramappa; +Cc: intel-gfx
[-- Attachment #1: Type: text/plain, Size: 6073 bytes --]
== Series Details ==
Series: Add DG2 OA support (rev6)
URL : https://patchwork.freedesktop.org/series/107584/
State : failure
== Summary ==
CI Bug Log - changes from CI_DRM_12238 -> Patchwork_107584v6
====================================================
Summary
-------
**FAILURE**
Serious unknown changes coming with Patchwork_107584v6 absolutely need to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_107584v6, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/index.html
Participating hosts (45 -> 12)
------------------------------
ERROR: It appears as if the changes made in Patchwork_107584v6 prevented too many machines from booting.
Missing (33): fi-rkl-11600 fi-rkl-guc fi-bdw-gvtdvm fi-icl-u2 fi-apl-guc fi-snb-2520m fi-pnv-d510 fi-blb-e6850 fi-skl-6600u fi-snb-2600 fi-bxt-dsi fi-bdw-5557u fi-adl-ddr5 fi-hsw-g3258 fi-ilk-650 fi-ctg-p8600 fi-hsw-4770 bat-atsm-1 fi-ivb-3770 fi-elk-e7500 fi-bsw-nick fi-skl-6700k2 fi-tgl-mst fi-kbl-7567u fi-skl-guc fi-cfl-8700k fi-glk-j4005 fi-ehl-2 fi-jsl-1 fi-cfl-guc fi-kbl-guc fi-cfl-8109u fi-kbl-8809g
Known issues
------------
Here are the changes found in Patchwork_107584v6 that come from known issues:
### IGT changes ###
#### Possible fixes ####
* igt@gem_exec_suspend@basic-s0@smem:
- {bat-rpls-2}: [DMESG-WARN][1] ([i915#6434]) -> [PASS][2]
[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12238/bat-rpls-2/igt@gem_exec_suspend@basic-s0@smem.html
[2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/bat-rpls-2/igt@gem_exec_suspend@basic-s0@smem.html
* igt@gem_exec_suspend@basic-s3@lmem0:
- {bat-dg2-11}: [DMESG-WARN][3] ([i915#6816]) -> [PASS][4]
[3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12238/bat-dg2-11/igt@gem_exec_suspend@basic-s3@lmem0.html
[4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/bat-dg2-11/igt@gem_exec_suspend@basic-s3@lmem0.html
* igt@gem_huc_copy@huc-copy:
- {bat-dg2-8}: [FAIL][5] ([i915#7029]) -> [PASS][6]
[5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12238/bat-dg2-8/igt@gem_huc_copy@huc-copy.html
[6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/bat-dg2-8/igt@gem_huc_copy@huc-copy.html
* igt@i915_module_load@load:
- {bat-dg2-8}: [DMESG-WARN][7] ([i915#7031]) -> [PASS][8]
[7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12238/bat-dg2-8/igt@i915_module_load@load.html
[8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/bat-dg2-8/igt@i915_module_load@load.html
* igt@i915_module_load@reload:
- {bat-rpls-2}: [DMESG-WARN][9] ([i915#5537]) -> [PASS][10]
[9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12238/bat-rpls-2/igt@i915_module_load@reload.html
[10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/bat-rpls-2/igt@i915_module_load@reload.html
* igt@i915_pm_rpm@basic-rte:
- {bat-rplp-1}: [DMESG-WARN][11] ([i915#7077]) -> [PASS][12]
[11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12238/bat-rplp-1/igt@i915_pm_rpm@basic-rte.html
[12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/bat-rplp-1/igt@i915_pm_rpm@basic-rte.html
{name}: This element is suppressed. This means it is ignored when computing
the status of the difference (SUCCESS, WARNING, or FAILURE).
[fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
[i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
[i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
[i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
[i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
[i915#5537]: https://gitlab.freedesktop.org/drm/intel/issues/5537
[i915#6434]: https://gitlab.freedesktop.org/drm/intel/issues/6434
[i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
[i915#6816]: https://gitlab.freedesktop.org/drm/intel/issues/6816
[i915#7029]: https://gitlab.freedesktop.org/drm/intel/issues/7029
[i915#7031]: https://gitlab.freedesktop.org/drm/intel/issues/7031
[i915#7077]: https://gitlab.freedesktop.org/drm/intel/issues/7077
Build changes
-------------
* IGT: IGT_7012 -> IGTPW_7832
* Linux: CI_DRM_12238 -> Patchwork_107584v6
CI-20190529: 20190529
CI_DRM_12238: 87844a66c86d81ccc88314bd68b9f7292d6b32a5 @ git://anongit.freedesktop.org/gfx-ci/linux
IGTPW_7832: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_7832/index.html
IGT_7012: ca6f5bdd537d26692c4b1ca011b8c4f227d95703 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
Patchwork_107584v6: 87844a66c86d81ccc88314bd68b9f7292d6b32a5 @ git://anongit.freedesktop.org/gfx-ci/linux
### Linux commits
dc4d509eacc2 drm/i915/perf: Enable OA for DG2
ad9961613d57 drm/i915/perf: complete programming whitelisting for XEHPSDV
1abaa3c7cf1d drm/i915/guc: Support OA when Wa_16011777198 is enabled
0c10ab21f0ca drm/i915/perf: Save/restore EU flex counters across reset
7edf5a50a432 drm/i915/perf: Apply Wa_18013179988
d2eb679e065f drm/i915/perf: Add Wa_1508761755:dg2
f18e6f9554b5 drm/i915/perf: Store a pointer to oa_format in oa_buffer
a3cbc3a8586b drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
b25358d6c3c9 drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
516ec42c2804 drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
80426c1b41fd drm/i915/perf: Simply use stream->ctx
e59be7d4442f drm/i915/perf: Enable bytes per clock reporting in OA
01761dd8a3a6 drm/i915/perf: Determine gen12 oa ctx offset at runtime
b384d35d4b60 drm/i915/perf: Fix noa wait predication for DG2
7934a7851efc drm/i915/perf: Add 32-bit OAG and OAR formats for DG2
e408d2a55c17 drm/i915/perf: Fix OA filtering logic for GuC mode
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107584v6/index.html
[-- Attachment #2: Type: text/html, Size: 6561 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
@ 2022-10-12 23:46 ` Dixit, Ashutosh
2022-10-13 17:05 ` Umesh Nerlige Ramappa
0 siblings, 1 reply; 23+ messages in thread
From: Dixit, Ashutosh @ 2022-10-12 23:46 UTC (permalink / raw)
To: Umesh Nerlige Ramappa; +Cc: intel-gfx
On Wed, 12 Oct 2022 15:27:27 -0700, Umesh Nerlige Ramappa wrote:
>
> +static u32 oa_context_image_offset(struct intel_context *ce, u32 reg)
> +{
> + u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
> + u32 *state = ce->lrc_reg_state;
> +
> + for (offset = 0; offset < len; ) {
> + if (IS_MI_LRI_CMD(state[offset])) {
> + /*
> + * We expect reg-value pairs in MI_LRI command, so
> + * MI_LRI_LEN() should be even, if not, issue a warning.
> + */
> + drm_WARN_ON(&ce->engine->i915->drm,
> + MI_LRI_LEN(state[offset]) & 0x1);
> +
> + if (oa_find_reg_in_lri(state, reg, &offset, len))
> + break;
> + } else {
> + offset++;
> + }
> + }
> +
> + /*
> + * Note that the the reg value is written to 'offset + 1' location, so
> + * ensure that the offset is less than (len - 1).
> + */
> + return offset < len ? offset : U32_MAX;
Should this then be 'offset < len - 1'? It is actually equivalent since
length is even and we do idx += 2 in oa_find_reg_in_lri so 'offset < len'
is the same as 'offset < len - 1' but since we mention (len - 1) in the
comment maybe clearer to use 'offset < len - 1'?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime
2022-10-12 23:46 ` Dixit, Ashutosh
@ 2022-10-13 17:05 ` Umesh Nerlige Ramappa
0 siblings, 0 replies; 23+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-13 17:05 UTC (permalink / raw)
To: Dixit, Ashutosh; +Cc: intel-gfx
On Wed, Oct 12, 2022 at 04:46:07PM -0700, Dixit, Ashutosh wrote:
>On Wed, 12 Oct 2022 15:27:27 -0700, Umesh Nerlige Ramappa wrote:
>>
>> +static u32 oa_context_image_offset(struct intel_context *ce, u32 reg)
>> +{
>> + u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
>> + u32 *state = ce->lrc_reg_state;
>> +
>> + for (offset = 0; offset < len; ) {
>> + if (IS_MI_LRI_CMD(state[offset])) {
>> + /*
>> + * We expect reg-value pairs in MI_LRI command, so
>> + * MI_LRI_LEN() should be even, if not, issue a warning.
>> + */
>> + drm_WARN_ON(&ce->engine->i915->drm,
>> + MI_LRI_LEN(state[offset]) & 0x1);
>> +
>> + if (oa_find_reg_in_lri(state, reg, &offset, len))
>> + break;
>> + } else {
>> + offset++;
>> + }
>> + }
>> +
>> + /*
>> + * Note that the the reg value is written to 'offset + 1' location, so
>> + * ensure that the offset is less than (len - 1).
>> + */
>> + return offset < len ? offset : U32_MAX;
>
>Should this then be 'offset < len - 1'? It is actually equivalent since
>length is even and we do idx += 2 in oa_find_reg_in_lri so 'offset < len'
>is the same as 'offset < len - 1' but since we mention (len - 1) in the
>comment maybe clearer to use 'offset < len - 1'?
I was experimenting with len - 1, but removed it later. Forgot to remove
the comment. I will drop the comment.
Thanks,
Umesh
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add DG2 OA support (rev7)
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
` (18 preceding siblings ...)
2022-10-12 23:03 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2022-10-17 22:05 ` Patchwork
19 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2022-10-17 22:05 UTC (permalink / raw)
To: Umesh Nerlige Ramappa; +Cc: intel-gfx
== Series Details ==
Series: Add DG2 OA support (rev7)
URL : https://patchwork.freedesktop.org/series/107584/
State : failure
== Summary ==
Error: make failed
CALL scripts/checksyscalls.sh
DESCEND objtool
CC [M] drivers/gpu/drm/i915/i915_perf.o
In file included from drivers/gpu/drm/i915/i915_perf.c:207:
drivers/gpu/drm/i915/i915_perf.c: In function ‘gen12_enable_metric_set’:
drivers/gpu/drm/i915/gt/intel_gt_regs.h:11:25: error: incompatible type for argument 2 of ‘intel_uncore_write’
#define MCR_REG(offset) ((const i915_mcr_reg_t){ .reg = (offset) })
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gt/intel_gt_regs.h:1150:28: note: in expansion of macro ‘MCR_REG’
#define GEN8_ROW_CHICKEN MCR_REG(0xe4f0)
^~~~~~~
drivers/gpu/drm/i915/i915_perf.c:2798:30: note: in expansion of macro ‘GEN8_ROW_CHICKEN’
intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
^~~~~~~~~~~~~~~~
In file included from ./drivers/gpu/drm/i915/gt/intel_engine_types.h:26,
from ./drivers/gpu/drm/i915/gt/intel_context_types.h:18,
from drivers/gpu/drm/i915/gem/i915_gem_context_types.h:20,
from drivers/gpu/drm/i915/gem/i915_gem_context.h:10,
from drivers/gpu/drm/i915/i915_perf.c:198:
./drivers/gpu/drm/i915/intel_uncore.h:353:18: note: expected ‘i915_reg_t’ {aka ‘struct <anonymous>’} but argument is of type ‘i915_mcr_reg_t’ {aka ‘const struct <anonymous>’}
i915_reg_t reg, u##x__ val) \
~~~~~~~~~~~^~~
./drivers/gpu/drm/i915/intel_uncore.h:366:1: note: in expansion of macro ‘__uncore_write’
__uncore_write(write, 32, l, true)
^~~~~~~~~~~~~~
In file included from drivers/gpu/drm/i915/i915_perf.c:207:
drivers/gpu/drm/i915/i915_perf.c: In function ‘gen12_disable_metric_set’:
drivers/gpu/drm/i915/gt/intel_gt_regs.h:11:25: error: incompatible type for argument 2 of ‘intel_uncore_write’
#define MCR_REG(offset) ((const i915_mcr_reg_t){ .reg = (offset) })
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gt/intel_gt_regs.h:1150:28: note: in expansion of macro ‘MCR_REG’
#define GEN8_ROW_CHICKEN MCR_REG(0xe4f0)
^~~~~~~
drivers/gpu/drm/i915/i915_perf.c:2887:30: note: in expansion of macro ‘GEN8_ROW_CHICKEN’
intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
^~~~~~~~~~~~~~~~
In file included from ./drivers/gpu/drm/i915/gt/intel_engine_types.h:26,
from ./drivers/gpu/drm/i915/gt/intel_context_types.h:18,
from drivers/gpu/drm/i915/gem/i915_gem_context_types.h:20,
from drivers/gpu/drm/i915/gem/i915_gem_context.h:10,
from drivers/gpu/drm/i915/i915_perf.c:198:
./drivers/gpu/drm/i915/intel_uncore.h:353:18: note: expected ‘i915_reg_t’ {aka ‘struct <anonymous>’} but argument is of type ‘i915_mcr_reg_t’ {aka ‘const struct <anonymous>’}
i915_reg_t reg, u##x__ val) \
~~~~~~~~~~~^~~
./drivers/gpu/drm/i915/intel_uncore.h:366:1: note: in expansion of macro ‘__uncore_write’
__uncore_write(write, 32, l, true)
^~~~~~~~~~~~~~
scripts/Makefile.build:250: recipe for target 'drivers/gpu/drm/i915/i915_perf.o' failed
make[5]: *** [drivers/gpu/drm/i915/i915_perf.o] Error 1
scripts/Makefile.build:500: recipe for target 'drivers/gpu/drm/i915' failed
make[4]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:500: recipe for target 'drivers/gpu/drm' failed
make[3]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:500: recipe for target 'drivers/gpu' failed
make[2]: *** [drivers/gpu] Error 2
scripts/Makefile.build:500: recipe for target 'drivers' failed
make[1]: *** [drivers] Error 2
Makefile:1992: recipe for target '.' failed
make: *** [.] Error 2
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-10-17 22:05 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-12 22:27 [Intel-gfx] [PATCH v4 00/16] Add DG2 OA support Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 01/16] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 02/16] drm/i915/perf: Add 32-bit OAG and OAR formats for DG2 Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 03/16] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 04/16] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
2022-10-12 23:46 ` Dixit, Ashutosh
2022-10-13 17:05 ` Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 05/16] drm/i915/perf: Enable bytes per clock reporting in OA Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 06/16] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 07/16] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 08/16] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 09/16] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 10/16] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 11/16] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 12/16] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 13/16] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 14/16] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 15/16] drm/i915/perf: complete programming whitelisting for XEHPSDV Umesh Nerlige Ramappa
2022-10-12 22:27 ` [Intel-gfx] [PATCH v4 16/16] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
2022-10-12 22:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev6) Patchwork
2022-10-12 22:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-10-12 23:03 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-10-17 22:05 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add DG2 OA support (rev7) Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox