public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt
@ 2023-05-06  0:58 Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

With MTL, frequency and rc6 counters are specific to a gt. Export these
counters via gt-specific events to the user space.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Test-with: 20230506005528.1890922-1-umesh.nerlige.ramappa@intel.com 

Tvrtko Ursulin (6):
  drm/i915/pmu: Support PMU for all engines
  drm/i915/pmu: Skip sampling engines with no enabled counters
  drm/i915/pmu: Transform PMU parking code to be GT based
  drm/i915/pmu: Add reference counting to the sampling timer
  drm/i915/pmu: Prepare for multi-tile non-engine counters
  drm/i915/pmu: Export counters from all tiles

 drivers/gpu/drm/i915/gt/intel_gt_pm.c |   4 +-
 drivers/gpu/drm/i915/i915_pmu.c       | 271 ++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_pmu.h       |  22 ++-
 include/uapi/drm/i915_drm.h           |  17 +-
 4 files changed, 225 insertions(+), 89 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
@ 2023-05-06  0:58 ` Umesh Nerlige Ramappa
  2023-05-08 17:52   ` Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 2/6] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Given how the metrics are already exported, we also need to run sampling
over engines from all GTs.

Problem of GT frequencies is left for later.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 7ece883a7d95..67fa6cd77529 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -10,6 +10,7 @@
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_regs.h"
 #include "gt/intel_engine_user.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_gt_regs.h"
 #include "gt/intel_rc6.h"
@@ -414,8 +415,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	struct drm_i915_private *i915 =
 		container_of(hrtimer, struct drm_i915_private, pmu.timer);
 	struct i915_pmu *pmu = &i915->pmu;
-	struct intel_gt *gt = to_gt(i915);
 	unsigned int period_ns;
+	struct intel_gt *gt;
+	unsigned int i;
 	ktime_t now;
 
 	if (!READ_ONCE(pmu->timer_enabled))
@@ -431,8 +433,13 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	 * grabbing the forcewake. However the potential error from timer call-
 	 * back delay greatly dominates this so we keep it simple.
 	 */
-	engines_sample(gt, period_ns);
-	frequency_sample(gt, period_ns);
+
+	for_each_gt(gt, i915, i) {
+		engines_sample(gt, period_ns);
+
+		if (i == 0) /* FIXME */
+			frequency_sample(gt, period_ns);
+	}
 
 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 2/6] drm/i915/pmu: Skip sampling engines with no enabled counters
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
@ 2023-05-06  0:58 ` Umesh Nerlige Ramappa
  2023-05-08 17:53   ` Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As we have more and more engines do not waste time sampling the ones no-
one is monitoring.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 67fa6cd77529..ba769f7fc385 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -339,6 +339,9 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
 		return;
 
 	for_each_engine(engine, gt, id) {
+		if (!engine->pmu.enable)
+			continue;
+
 		if (!intel_engine_pm_get_if_awake(engine))
 			continue;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 2/6] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
@ 2023-05-06  0:58 ` Umesh Nerlige Ramappa
  2023-05-08 17:55   ` Umesh Nerlige Ramappa
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Trivial prep work for full multi-tile enablement later.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  4 ++--
 drivers/gpu/drm/i915/i915_pmu.c       | 16 ++++++++--------
 drivers/gpu/drm/i915/i915_pmu.h       |  9 +++++----
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index e02cb90723ae..c2e69bafd02b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -87,7 +87,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
 
 	intel_rc6_unpark(&gt->rc6);
 	intel_rps_unpark(&gt->rps);
-	i915_pmu_gt_unparked(i915);
+	i915_pmu_gt_unparked(gt);
 	intel_guc_busyness_unpark(gt);
 
 	intel_gt_unpark_requests(gt);
@@ -109,7 +109,7 @@ static int __gt_park(struct intel_wakeref *wf)
 
 	intel_guc_busyness_park(gt);
 	i915_vma_parked(gt);
-	i915_pmu_gt_parked(i915);
+	i915_pmu_gt_parked(gt);
 	intel_rps_park(&gt->rps);
 	intel_rc6_park(&gt->rc6);
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index ba769f7fc385..2b63ee31e1b3 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -217,11 +217,11 @@ static void init_rc6(struct i915_pmu *pmu)
 	}
 }
 
-static void park_rc6(struct drm_i915_private *i915)
+static void park_rc6(struct intel_gt *gt)
 {
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
+	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
 	pmu->sleep_last = ktime_get_raw();
 }
 
@@ -236,16 +236,16 @@ static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
 	}
 }
 
-void i915_pmu_gt_parked(struct drm_i915_private *i915)
+void i915_pmu_gt_parked(struct intel_gt *gt)
 {
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = &gt->i915->pmu;
 
 	if (!pmu->base.event_init)
 		return;
 
 	spin_lock_irq(&pmu->lock);
 
-	park_rc6(i915);
+	park_rc6(gt);
 
 	/*
 	 * Signal sampling timer to stop if only engine events are enabled and
@@ -256,9 +256,9 @@ void i915_pmu_gt_parked(struct drm_i915_private *i915)
 	spin_unlock_irq(&pmu->lock);
 }
 
-void i915_pmu_gt_unparked(struct drm_i915_private *i915)
+void i915_pmu_gt_unparked(struct intel_gt *gt)
 {
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = &gt->i915->pmu;
 
 	if (!pmu->base.event_init)
 		return;
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index c30f43319a78..a686fd7ccedf 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -13,6 +13,7 @@
 #include <uapi/drm/i915_drm.h>
 
 struct drm_i915_private;
+struct intel_gt;
 
 /*
  * Non-engine events that we need to track enabled-disabled transition and
@@ -151,15 +152,15 @@ int i915_pmu_init(void);
 void i915_pmu_exit(void);
 void i915_pmu_register(struct drm_i915_private *i915);
 void i915_pmu_unregister(struct drm_i915_private *i915);
-void i915_pmu_gt_parked(struct drm_i915_private *i915);
-void i915_pmu_gt_unparked(struct drm_i915_private *i915);
+void i915_pmu_gt_parked(struct intel_gt *gt);
+void i915_pmu_gt_unparked(struct intel_gt *gt);
 #else
 static inline int i915_pmu_init(void) { return 0; }
 static inline void i915_pmu_exit(void) {}
 static inline void i915_pmu_register(struct drm_i915_private *i915) {}
 static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
-static inline void i915_pmu_gt_parked(struct drm_i915_private *i915) {}
-static inline void i915_pmu_gt_unparked(struct drm_i915_private *i915) {}
+static inline void i915_pmu_gt_parked(struct intel_gt *gt) {}
+static inline void i915_pmu_gt_unparked(struct intel_gt *gt) {}
 #endif
 
 #endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
@ 2023-05-06  0:58 ` Umesh Nerlige Ramappa
  2023-05-08 17:58   ` Umesh Nerlige Ramappa
                     ` (2 more replies)
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We do not want to have timers per tile and waste CPU cycles and energy via
multiple wake-up sources, for a relatively un-important task of PMU
sampling, so keeping a single timer works well. But we also do not want
the first GT which goes idle to turn off the timer.

Add some reference counting, via a mask of unparked GTs, to solve this.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
 drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 2b63ee31e1b3..669a42e44082 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
 	 * Signal sampling timer to stop if only engine events are enabled and
 	 * GPU went idle.
 	 */
-	pmu->timer_enabled = pmu_needs_timer(pmu, false);
+	pmu->unparked &= ~BIT(gt->info.id);
+	if (pmu->unparked == 0)
+		pmu->timer_enabled = pmu_needs_timer(pmu, false);
 
 	spin_unlock_irq(&pmu->lock);
 }
@@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
 	/*
 	 * Re-enable sampling timer when GPU goes active.
 	 */
-	__i915_pmu_maybe_start_timer(pmu);
+	if (pmu->unparked == 0)
+		__i915_pmu_maybe_start_timer(pmu);
+
+	pmu->unparked |= BIT(gt->info.id);
 
 	spin_unlock_irq(&pmu->lock);
 }
@@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	 */
 
 	for_each_gt(gt, i915, i) {
+		if (!(pmu->unparked & BIT(i)))
+			continue;
+
 		engines_sample(gt, period_ns);
 
 		if (i == 0) /* FIXME */
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index a686fd7ccedf..3a811266ac6a 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -76,6 +76,10 @@ struct i915_pmu {
 	 * @lock: Lock protecting enable mask and ref count handling.
 	 */
 	spinlock_t lock;
+	/**
+	 * @unparked: GT unparked mask.
+	 */
+	unsigned int unparked;
 	/**
 	 * @timer: Timer for internal i915 PMU sampling.
 	 */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
@ 2023-05-06  0:58 ` Umesh Nerlige Ramappa
  2023-05-08 18:07   ` Umesh Nerlige Ramappa
  2023-05-12  1:08   ` Dixit, Ashutosh
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reserve some bits in the counter config namespace which will carry the
tile id and prepare the code to handle this.

No per tile counters have been added yet.

v2:
- Fix checkpatch issues
- Use 4 bits for gt id in non-engine counters. Drop FIXME.
- Set MAX GTs to 4. Drop FIXME.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_pmu.h |   9 +-
 include/uapi/drm/i915_drm.h     |  17 +++-
 3 files changed, 129 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 669a42e44082..12b2f3169abf 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
 	return config < __I915_PMU_OTHER(0);
 }
 
+static unsigned int config_gt_id(const u64 config)
+{
+	return config >> __I915_PMU_GT_SHIFT;
+}
+
+static u64 config_counter(const u64 config)
+{
+	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
+}
+
 static unsigned int other_bit(const u64 config)
 {
 	unsigned int val;
 
-	switch (config) {
+	switch (config_counter(config)) {
 	case I915_PMU_ACTUAL_FREQUENCY:
 		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
 		break;
@@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
 		return -1;
 	}
 
-	return I915_ENGINE_SAMPLE_COUNT + val;
+	return I915_ENGINE_SAMPLE_COUNT +
+	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
+	       val;
 }
 
 static unsigned int config_bit(const u64 config)
 {
-	if (is_engine_config(config))
+	if (is_engine_config(config)) {
+		GEM_BUG_ON(config_gt_id(config));
+
 		return engine_config_sample(config);
-	else
+	} else {
 		return other_bit(config);
+	}
 }
 
 static u64 config_mask(u64 config)
@@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
 	return config_bit(event->attr.config);
 }
 
+static u64 frequency_enabled_mask(void)
+{
+	unsigned int i;
+	u64 mask = 0;
+
+	for (i = 0; i < I915_PMU_MAX_GTS; i++)
+		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
+			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
+
+	return mask;
+}
+
 static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
 {
 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
@@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
 	 * Mask out all the ones which do not need the timer, or in
 	 * other words keep all the ones that could need the timer.
 	 */
-	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
-		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
-		  ENGINE_SAMPLE_MASK;
+	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
 
 	/*
 	 * When the GPU is idle per-engine counters do not need to be
@@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
 	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
 }
 
+static unsigned int
+__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
+{
+	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
+
+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
+
+	return idx;
+}
+
+static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
+{
+	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
+}
+
+static void
+store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
+{
+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
+}
+
+static void
+add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
+{
+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
+}
+
 static u64 get_rc6(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
+	const unsigned int gt_id = gt->info.id;
 	struct i915_pmu *pmu = &i915->pmu;
 	unsigned long flags;
 	bool awake = false;
@@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
 	spin_lock_irqsave(&pmu->lock, flags);
 
 	if (awake) {
-		pmu->sample[__I915_SAMPLE_RC6].cur = val;
+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
 	} else {
 		/*
 		 * We think we are runtime suspended.
@@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
 		 * on top of the last known real value, as the approximated RC6
 		 * counter value.
 		 */
-		val = ktime_since_raw(pmu->sleep_last);
-		val += pmu->sample[__I915_SAMPLE_RC6].cur;
+		val = ktime_since_raw(pmu->sleep_last[gt_id]);
+		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
 	}
 
-	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
-		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
+	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
+		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
 	else
-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
 
 	spin_unlock_irqrestore(&pmu->lock, flags);
 
@@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
 static void init_rc6(struct i915_pmu *pmu)
 {
 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
-	intel_wakeref_t wakeref;
+	struct intel_gt *gt;
+	unsigned int i;
+
+	for_each_gt(gt, i915, i) {
+		intel_wakeref_t wakeref;
 
-	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
-		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
-					pmu->sample[__I915_SAMPLE_RC6].cur;
-		pmu->sleep_last = ktime_get_raw();
+		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
+			u64 val = __get_rc6(gt);
+
+			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
+			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
+				     val);
+			pmu->sleep_last[i] = ktime_get_raw();
+		}
 	}
 }
 
@@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
 {
 	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
-	pmu->sleep_last = ktime_get_raw();
+	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
+	pmu->sleep_last[gt->info.id] = ktime_get_raw();
 }
 
 static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
@@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
 	}
 }
 
-static void
-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
-{
-	sample->cur += mul_u32_u32(val, mul);
-}
-
-static bool frequency_sampling_enabled(struct i915_pmu *pmu)
+static bool
+frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
 {
 	return pmu->enable &
-	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
-		config_mask(I915_PMU_REQUESTED_FREQUENCY));
+	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
+		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
 }
 
 static void
 frequency_sample(struct intel_gt *gt, unsigned int period_ns)
 {
 	struct drm_i915_private *i915 = gt->i915;
+	const unsigned int gt_id = gt->info.id;
 	struct i915_pmu *pmu = &i915->pmu;
 	struct intel_rps *rps = &gt->rps;
 
-	if (!frequency_sampling_enabled(pmu))
+	if (!frequency_sampling_enabled(pmu, gt_id))
 		return;
 
 	/* Report 0/0 (actual/requested) frequency while parked. */
 	if (!intel_gt_pm_get_if_awake(gt))
 		return;
 
-	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
 		u32 val;
 
 		/*
@@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
 		if (!val)
 			val = intel_gpu_freq(rps, rps->cur_freq);
 
-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
 				val, period_ns / 1000);
 	}
 
-	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
 				intel_rps_get_requested_frequency(rps),
 				period_ns / 1000);
 	}
@@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 			continue;
 
 		engines_sample(gt, period_ns);
-
-		if (i == 0) /* FIXME */
-			frequency_sample(gt, period_ns);
+		frequency_sample(gt, period_ns);
 	}
 
 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
@@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
 {
 	struct intel_gt *gt = to_gt(i915);
 
-	switch (config) {
+	unsigned int gt_id = config_gt_id(config);
+
+	if (gt_id)
+		return -ENOENT;
+
+	switch (config_counter(config)) {
 	case I915_PMU_ACTUAL_FREQUENCY:
 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
 			/* Requires a mutex for sampling! */
@@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 		}
 	} else {
-		switch (event->attr.config) {
+		const unsigned int gt_id = config_gt_id(event->attr.config);
+		const u64 config = config_counter(event->attr.config);
+
+		switch (config) {
 		case I915_PMU_ACTUAL_FREQUENCY:
 			val =
-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
+			   div_u64(read_sample(pmu, gt_id,
+					       __I915_SAMPLE_FREQ_ACT),
 				   USEC_PER_SEC /* to MHz */);
 			break;
 		case I915_PMU_REQUESTED_FREQUENCY:
 			val =
-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
+			   div_u64(read_sample(pmu, gt_id,
+					       __I915_SAMPLE_FREQ_REQ),
 				   USEC_PER_SEC /* to MHz */);
 			break;
 		case I915_PMU_INTERRUPTS:
 			val = READ_ONCE(pmu->irq_count);
 			break;
 		case I915_PMU_RC6_RESIDENCY:
-			val = get_rc6(to_gt(i915));
+			val = get_rc6(i915->gt[gt_id]);
 			break;
 		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
 			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index 3a811266ac6a..d47846f21ddf 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -38,13 +38,16 @@ enum {
 	__I915_NUM_PMU_SAMPLERS
 };
 
+#define I915_PMU_MAX_GTS (4)
+
 /*
  * How many different events we track in the global PMU mask.
  *
  * It is also used to know to needed number of event reference counters.
  */
 #define I915_PMU_MASK_BITS \
-	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
+	(I915_ENGINE_SAMPLE_COUNT + \
+	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
 
 #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
 
@@ -124,11 +127,11 @@ struct i915_pmu {
 	 * Only global counters are held here, while the per-engine ones are in
 	 * struct intel_engine_cs.
 	 */
-	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
+	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
 	/**
 	 * @sleep_last: Last time GT parked for RC6 estimation.
 	 */
-	ktime_t sleep_last;
+	ktime_t sleep_last[I915_PMU_MAX_GTS];
 	/**
 	 * @irq_count: Number of interrupts
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dba7c5a5b25e..d5ac1fdeb2b1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
-#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
+/*
+ * Top 4 bits of every non-engine counter are GT id.
+ */
+#define __I915_PMU_GT_SHIFT (60)
+
+#define ___I915_PMU_OTHER(gt, x) \
+	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
+	((__u64)(gt) << __I915_PMU_GT_SHIFT))
+
+#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
 #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
@@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
 
 #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
 
+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
+#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
+#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
+
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
 #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
@ 2023-05-06  0:58 ` Umesh Nerlige Ramappa
  2023-05-08 18:08   ` Umesh Nerlige Ramappa
                     ` (2 more replies)
  2023-05-06  2:20 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt (rev2) Patchwork
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-06  0:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Start exporting frequency and RC6 counters from all tiles.

Existing counters keep their names and config values and new one use the
namespace added in the previous patch, with the "-gtN" added to their
names.

Interrupts counter is an odd one off. Because it is the global device
counters (not only GT) we choose not to add per tile versions for now.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 87 ++++++++++++++++++++++-----------
 1 file changed, 59 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 12b2f3169abf..284e5c5b97bb 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
 	struct intel_gt *gt = to_gt(i915);
 
 	unsigned int gt_id = config_gt_id(config);
+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
 
-	if (gt_id)
+	if (gt_id > max_gt_id)
 		return -ENOENT;
 
 	switch (config_counter(config)) {
@@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
 			return -ENODEV;
 		break;
 	case I915_PMU_INTERRUPTS:
+		if (gt_id)
+			return -ENOENT;
 		break;
 	case I915_PMU_RC6_RESIDENCY:
 		if (!gt->rc6.supported)
@@ -930,11 +933,20 @@ static const struct attribute_group i915_pmu_cpumask_attr_group = {
 	.attrs = i915_cpumask_attrs,
 };
 
-#define __event(__config, __name, __unit) \
+#define __event(__counter, __name, __unit) \
 { \
-	.config = (__config), \
+	.counter = (__counter), \
 	.name = (__name), \
 	.unit = (__unit), \
+	.global = false, \
+}
+
+#define __global_event(__counter, __name, __unit) \
+{ \
+	.counter = (__counter), \
+	.name = (__name), \
+	.unit = (__unit), \
+	.global = true, \
 }
 
 #define __engine_event(__sample, __name) \
@@ -973,15 +985,16 @@ create_event_attributes(struct i915_pmu *pmu)
 {
 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
 	static const struct {
-		u64 config;
+		unsigned int counter;
 		const char *name;
 		const char *unit;
+		bool global;
 	} events[] = {
-		__event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
-		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
-		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
-		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
-		__event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, "software-gt-awake-time", "ns"),
+		__event(0, "actual-frequency", "M"),
+		__event(1, "requested-frequency", "M"),
+		__global_event(2, "interrupts", NULL),
+		__event(3, "rc6-residency", "ns"),
+		__event(4, "software-gt-awake-time", "ns"),
 	};
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
@@ -996,12 +1009,17 @@ create_event_attributes(struct i915_pmu *pmu)
 	struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
 	struct attribute **attr = NULL, **attr_iter;
 	struct intel_engine_cs *engine;
-	unsigned int i;
+	struct intel_gt *gt;
+	unsigned int i, j;
 
 	/* Count how many counters we will be exposing. */
-	for (i = 0; i < ARRAY_SIZE(events); i++) {
-		if (!config_status(i915, events[i].config))
-			count++;
+	for_each_gt(gt, i915, j) {
+		for (i = 0; i < ARRAY_SIZE(events); i++) {
+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+
+			if (!config_status(i915, config))
+				count++;
+		}
 	}
 
 	for_each_uabi_engine(engine, i915) {
@@ -1031,26 +1049,39 @@ create_event_attributes(struct i915_pmu *pmu)
 	attr_iter = attr;
 
 	/* Initialize supported non-engine counters. */
-	for (i = 0; i < ARRAY_SIZE(events); i++) {
-		char *str;
-
-		if (config_status(i915, events[i].config))
-			continue;
-
-		str = kstrdup(events[i].name, GFP_KERNEL);
-		if (!str)
-			goto err;
+	for_each_gt(gt, i915, j) {
+		for (i = 0; i < ARRAY_SIZE(events); i++) {
+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+			char *str;
 
-		*attr_iter++ = &i915_iter->attr.attr;
-		i915_iter = add_i915_attr(i915_iter, str, events[i].config);
+			if (config_status(i915, config))
+				continue;
 
-		if (events[i].unit) {
-			str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
+			if (events[i].global || !HAS_EXTRA_GT_LIST(i915))
+				str = kstrdup(events[i].name, GFP_KERNEL);
+			else
+				str = kasprintf(GFP_KERNEL, "%s-gt%u",
+						events[i].name, j);
 			if (!str)
 				goto err;
 
-			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
+			*attr_iter++ = &i915_iter->attr.attr;
+			i915_iter = add_i915_attr(i915_iter, str, config);
+
+			if (events[i].unit) {
+				if (events[i].global || !HAS_EXTRA_GT_LIST(i915))
+					str = kasprintf(GFP_KERNEL, "%s.unit",
+							events[i].name);
+				else
+					str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
+							events[i].name, j);
+				if (!str)
+					goto err;
+
+				*attr_iter++ = &pmu_iter->attr.attr;
+				pmu_iter = add_pmu_attr(pmu_iter, str,
+							events[i].unit);
+			}
 		}
 	}
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt (rev2)
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (5 preceding siblings ...)
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
@ 2023-05-06  2:20 ` Patchwork
  2023-05-06  2:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
  2023-05-06  2:38 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  8 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-05-06  2:20 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add MTL PMU support for multi-gt (rev2)
URL   : https://patchwork.freedesktop.org/series/115836/
State : warning

== Summary ==

Error: dim checkpatch failed
4fcbc0ab4f43 drm/i915/pmu: Support PMU for all engines
64197b1bcaba drm/i915/pmu: Skip sampling engines with no enabled counters
aefd29011a00 drm/i915/pmu: Transform PMU parking code to be GT based
1a9fda0a5224 drm/i915/pmu: Add reference counting to the sampling timer
ff7ecb6be508 drm/i915/pmu: Prepare for multi-tile non-engine counters
-:60: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#60: FILE: drivers/gpu/drm/i915/i915_pmu.c:99:
+		GEM_BUG_ON(config_gt_id(config));

-:109: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#109: FILE: drivers/gpu/drm/i915/i915_pmu.c:197:
+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));

total: 0 errors, 2 warnings, 0 checks, 342 lines checked
94d6b886eeaf drm/i915/pmu: Export counters from all tiles



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add MTL PMU support for multi-gt (rev2)
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (6 preceding siblings ...)
  2023-05-06  2:20 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt (rev2) Patchwork
@ 2023-05-06  2:21 ` Patchwork
  2023-05-06  2:38 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  8 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-05-06  2:21 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add MTL PMU support for multi-gt (rev2)
URL   : https://patchwork.freedesktop.org/series/115836/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for Add MTL PMU support for multi-gt (rev2)
  2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (7 preceding siblings ...)
  2023-05-06  2:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2023-05-06  2:38 ` Patchwork
  8 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-05-06  2:38 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 6070 bytes --]

== Series Details ==

Series: Add MTL PMU support for multi-gt (rev2)
URL   : https://patchwork.freedesktop.org/series/115836/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13115 -> Patchwork_115836v2
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_115836v2 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_115836v2, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/index.html

Participating hosts (40 -> 40)
------------------------------

  Additional (1): fi-kbl-soraka 
  Missing    (1): fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_115836v2:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
    - fi-kbl-soraka:      NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-kbl-soraka/igt@kms_addfb_basic@addfb25-x-tiled-legacy.html

  
Known issues
------------

  Here are the changes found in Patchwork_115836v2 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][2] ([fdo#109271] / [i915#2190])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][3] ([fdo#109271] / [i915#4613]) +3 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-kbl-soraka/igt@gem_lmem_swapping@basic.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][4] ([i915#1886] / [i915#7913])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@mman:
    - bat-rpls-1:         [PASS][5] -> [TIMEOUT][6] ([i915#6794] / [i915#7392])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13115/bat-rpls-1/igt@i915_selftest@live@mman.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/bat-rpls-1/igt@i915_selftest@live@mman.html

  * igt@kms_chamelium_frames@hdmi-crc-fast:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][7] ([fdo#109271]) +16 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-kbl-soraka/igt@kms_chamelium_frames@hdmi-crc-fast.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
    - fi-bsw-n3050:       NOTRUN -> [SKIP][8] ([fdo#109271])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-bsw-n3050/igt@kms_chamelium_hpd@common-hpd-after-suspend.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@execlists:
    - fi-bsw-n3050:       [ABORT][9] ([i915#7913]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13115/fi-bsw-n3050/igt@i915_selftest@live@execlists.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/fi-bsw-n3050/igt@i915_selftest@live@execlists.html

  * igt@i915_selftest@live@requests:
    - {bat-mtlp-8}:       [ABORT][11] ([i915#4983] / [i915#7920]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13115/bat-mtlp-8/igt@i915_selftest@live@requests.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/bat-mtlp-8/igt@i915_selftest@live@requests.html

  * igt@i915_selftest@live@slpc:
    - bat-rpls-2:         [DMESG-WARN][13] ([i915#6367]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13115/bat-rpls-2/igt@i915_selftest@live@slpc.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/bat-rpls-2/igt@i915_selftest@live@slpc.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#6794]: https://gitlab.freedesktop.org/drm/intel/issues/6794
  [i915#7392]: https://gitlab.freedesktop.org/drm/intel/issues/7392
  [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920


Build changes
-------------

  * IGT: IGT_7281 -> IGTPW_8924
  * Linux: CI_DRM_13115 -> Patchwork_115836v2

  CI-20190529: 20190529
  CI_DRM_13115: e0ccca9f289364f4e54b826c3f1feebbf121eaec @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_8924: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8924/index.html
  IGT_7281: 9e9cd7e69a393b7cce8fc12fce409eb59817dd7e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_115836v2: e0ccca9f289364f4e54b826c3f1feebbf121eaec @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

b2e2aa087fe8 drm/i915/pmu: Export counters from all tiles
52377aaf5b89 drm/i915/pmu: Prepare for multi-tile non-engine counters
ab3592c92c85 drm/i915/pmu: Add reference counting to the sampling timer
ce9489206e51 drm/i915/pmu: Transform PMU parking code to be GT based
07540289135e drm/i915/pmu: Skip sampling engines with no enabled counters
bba82701c1a6 drm/i915/pmu: Support PMU for all engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v2/index.html

[-- Attachment #2: Type: text/html, Size: 7046 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
@ 2023-05-08 17:52   ` Umesh Nerlige Ramappa
  2023-05-09 12:26     ` Tvrtko Ursulin
  0 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-08 17:52 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Fri, May 05, 2023 at 05:58:11PM -0700, Umesh Nerlige Ramappa wrote:
>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>Given how the metrics are already exported, we also need to run sampling
>over engines from all GTs.
>
>Problem of GT frequencies is left for later.
>
>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/i915/i915_pmu.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 7ece883a7d95..67fa6cd77529 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -10,6 +10,7 @@
> #include "gt/intel_engine_pm.h"
> #include "gt/intel_engine_regs.h"
> #include "gt/intel_engine_user.h"
>+#include "gt/intel_gt.h"
> #include "gt/intel_gt_pm.h"
> #include "gt/intel_gt_regs.h"
> #include "gt/intel_rc6.h"
>@@ -414,8 +415,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> 	struct drm_i915_private *i915 =
> 		container_of(hrtimer, struct drm_i915_private, pmu.timer);
> 	struct i915_pmu *pmu = &i915->pmu;
>-	struct intel_gt *gt = to_gt(i915);
> 	unsigned int period_ns;
>+	struct intel_gt *gt;
>+	unsigned int i;
> 	ktime_t now;
>
> 	if (!READ_ONCE(pmu->timer_enabled))
>@@ -431,8 +433,13 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> 	 * grabbing the forcewake. However the potential error from timer call-
> 	 * back delay greatly dominates this so we keep it simple.
> 	 */
>-	engines_sample(gt, period_ns);
>-	frequency_sample(gt, period_ns);
>+
>+	for_each_gt(gt, i915, i) {
>+		engines_sample(gt, period_ns);
>+
>+		if (i == 0) /* FIXME */
>+			frequency_sample(gt, period_ns);

If the current series is already handling the FIXME at a later patch, I 
would just change this to a comment - /* Support gt0 for now */

With or without that, this is

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

@Tvrtko, Note that I am only transporting the patches (unmodified) from 
internal to upstream, so assuming I am still a valid reviewer. If not, 
let me know.

Thanks,
Umesh

>+	}
>
> 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
>
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] drm/i915/pmu: Skip sampling engines with no enabled counters
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 2/6] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
@ 2023-05-08 17:53   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-08 17:53 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Fri, May 05, 2023 at 05:58:12PM -0700, Umesh Nerlige Ramappa wrote:
>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>As we have more and more engines do not waste time sampling the ones no-
>one is monitoring.
>
>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/i915/i915_pmu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 67fa6cd77529..ba769f7fc385 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -339,6 +339,9 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
> 		return;
>
> 	for_each_engine(engine, gt, id) {
>+		if (!engine->pmu.enable)
>+			continue;
>+

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks,
Umesh
> 		if (!intel_engine_pm_get_if_awake(engine))
> 			continue;
>
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
@ 2023-05-08 17:55   ` Umesh Nerlige Ramappa
  2023-05-09 15:10     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-08 17:55 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Fri, May 05, 2023 at 05:58:13PM -0700, Umesh Nerlige Ramappa wrote:
>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>Trivial prep work for full multi-tile enablement later.

Some more description on what this does OR how park/unpark affects pmu 
counters would help.

Thanks,
Umesh

>
>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/i915/gt/intel_gt_pm.c |  4 ++--
> drivers/gpu/drm/i915/i915_pmu.c       | 16 ++++++++--------
> drivers/gpu/drm/i915/i915_pmu.h       |  9 +++++----
> 3 files changed, 15 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>index e02cb90723ae..c2e69bafd02b 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>@@ -87,7 +87,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
>
> 	intel_rc6_unpark(&gt->rc6);
> 	intel_rps_unpark(&gt->rps);
>-	i915_pmu_gt_unparked(i915);
>+	i915_pmu_gt_unparked(gt);
> 	intel_guc_busyness_unpark(gt);
>
> 	intel_gt_unpark_requests(gt);
>@@ -109,7 +109,7 @@ static int __gt_park(struct intel_wakeref *wf)
>
> 	intel_guc_busyness_park(gt);
> 	i915_vma_parked(gt);
>-	i915_pmu_gt_parked(i915);
>+	i915_pmu_gt_parked(gt);
> 	intel_rps_park(&gt->rps);
> 	intel_rc6_park(&gt->rc6);
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index ba769f7fc385..2b63ee31e1b3 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -217,11 +217,11 @@ static void init_rc6(struct i915_pmu *pmu)
> 	}
> }
>
>-static void park_rc6(struct drm_i915_private *i915)
>+static void park_rc6(struct intel_gt *gt)
> {
>-	struct i915_pmu *pmu = &i915->pmu;
>+	struct i915_pmu *pmu = &gt->i915->pmu;
>
>-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
>+	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
> 	pmu->sleep_last = ktime_get_raw();
> }
>
>@@ -236,16 +236,16 @@ static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
> 	}
> }
>
>-void i915_pmu_gt_parked(struct drm_i915_private *i915)
>+void i915_pmu_gt_parked(struct intel_gt *gt)
> {
>-	struct i915_pmu *pmu = &i915->pmu;
>+	struct i915_pmu *pmu = &gt->i915->pmu;
>
> 	if (!pmu->base.event_init)
> 		return;
>
> 	spin_lock_irq(&pmu->lock);
>
>-	park_rc6(i915);
>+	park_rc6(gt);
>
> 	/*
> 	 * Signal sampling timer to stop if only engine events are enabled and
>@@ -256,9 +256,9 @@ void i915_pmu_gt_parked(struct drm_i915_private *i915)
> 	spin_unlock_irq(&pmu->lock);
> }
>
>-void i915_pmu_gt_unparked(struct drm_i915_private *i915)
>+void i915_pmu_gt_unparked(struct intel_gt *gt)
> {
>-	struct i915_pmu *pmu = &i915->pmu;
>+	struct i915_pmu *pmu = &gt->i915->pmu;
>
> 	if (!pmu->base.event_init)
> 		return;
>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>index c30f43319a78..a686fd7ccedf 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.h
>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>@@ -13,6 +13,7 @@
> #include <uapi/drm/i915_drm.h>
>
> struct drm_i915_private;
>+struct intel_gt;
>
> /*
>  * Non-engine events that we need to track enabled-disabled transition and
>@@ -151,15 +152,15 @@ int i915_pmu_init(void);
> void i915_pmu_exit(void);
> void i915_pmu_register(struct drm_i915_private *i915);
> void i915_pmu_unregister(struct drm_i915_private *i915);
>-void i915_pmu_gt_parked(struct drm_i915_private *i915);
>-void i915_pmu_gt_unparked(struct drm_i915_private *i915);
>+void i915_pmu_gt_parked(struct intel_gt *gt);
>+void i915_pmu_gt_unparked(struct intel_gt *gt);
> #else
> static inline int i915_pmu_init(void) { return 0; }
> static inline void i915_pmu_exit(void) {}
> static inline void i915_pmu_register(struct drm_i915_private *i915) {}
> static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
>-static inline void i915_pmu_gt_parked(struct drm_i915_private *i915) {}
>-static inline void i915_pmu_gt_unparked(struct drm_i915_private *i915) {}
>+static inline void i915_pmu_gt_parked(struct intel_gt *gt) {}
>+static inline void i915_pmu_gt_unparked(struct intel_gt *gt) {}
> #endif
>
> #endif
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
@ 2023-05-08 17:58   ` Umesh Nerlige Ramappa
  2023-05-09 17:25   ` Dixit, Ashutosh
  2023-05-12 22:29   ` Dixit, Ashutosh
  2 siblings, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-08 17:58 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Fri, May 05, 2023 at 05:58:14PM -0700, Umesh Nerlige Ramappa wrote:
>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>We do not want to have timers per tile and waste CPU cycles and energy via
>multiple wake-up sources, for a relatively un-important task of PMU
>sampling, so keeping a single timer works well. But we also do not want
>the first GT which goes idle to turn off the timer.
>
>Add some reference counting, via a mask of unparked GTs, to solve this.

Looks like the previous patch is a prep work for this one. I would 
mention something about this patch in the previous patch, but then I am 
not sure what's the norm in these scenarios. Recently I created some IGT 
patches that are prep work and refer to future patches in the series.

As is, this patch is 

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks,
Umesh

>
>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
> drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 2b63ee31e1b3..669a42e44082 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
> 	 * Signal sampling timer to stop if only engine events are enabled and
> 	 * GPU went idle.
> 	 */
>-	pmu->timer_enabled = pmu_needs_timer(pmu, false);
>+	pmu->unparked &= ~BIT(gt->info.id);
>+	if (pmu->unparked == 0)
>+		pmu->timer_enabled = pmu_needs_timer(pmu, false);
>
> 	spin_unlock_irq(&pmu->lock);
> }
>@@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
> 	/*
> 	 * Re-enable sampling timer when GPU goes active.
> 	 */
>-	__i915_pmu_maybe_start_timer(pmu);
>+	if (pmu->unparked == 0)
>+		__i915_pmu_maybe_start_timer(pmu);
>+
>+	pmu->unparked |= BIT(gt->info.id);
>
> 	spin_unlock_irq(&pmu->lock);
> }
>@@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> 	 */
>
> 	for_each_gt(gt, i915, i) {
>+		if (!(pmu->unparked & BIT(i)))
>+			continue;
>+
> 		engines_sample(gt, period_ns);
>
> 		if (i == 0) /* FIXME */
>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>index a686fd7ccedf..3a811266ac6a 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.h
>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>@@ -76,6 +76,10 @@ struct i915_pmu {
> 	 * @lock: Lock protecting enable mask and ref count handling.
> 	 */
> 	spinlock_t lock;
>+	/**
>+	 * @unparked: GT unparked mask.
>+	 */
>+	unsigned int unparked;
> 	/**
> 	 * @timer: Timer for internal i915 PMU sampling.
> 	 */
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
@ 2023-05-08 18:07   ` Umesh Nerlige Ramappa
  2023-05-12  1:08   ` Dixit, Ashutosh
  1 sibling, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-08 18:07 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Fri, May 05, 2023 at 05:58:15PM -0700, Umesh Nerlige Ramappa wrote:
>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>Reserve some bits in the counter config namespace which will carry the
>tile id and prepare the code to handle this.
>
>No per tile counters have been added yet.
>
>v2:
>- Fix checkpatch issues
>- Use 4 bits for gt id in non-engine counters. Drop FIXME.
>- Set MAX GTs to 4. Drop FIXME.

I touched this one, so cannot review it. Also a reminder to myself to 
add the UMD changes (intel_gpu_top) link here.

Umesh

>
>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
> drivers/gpu/drm/i915/i915_pmu.h |   9 +-
> include/uapi/drm/i915_drm.h     |  17 +++-
> 3 files changed, 129 insertions(+), 47 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 669a42e44082..12b2f3169abf 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
> 	return config < __I915_PMU_OTHER(0);
> }
>
>+static unsigned int config_gt_id(const u64 config)
>+{
>+	return config >> __I915_PMU_GT_SHIFT;
>+}
>+
>+static u64 config_counter(const u64 config)
>+{
>+	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
>+}
>+
> static unsigned int other_bit(const u64 config)
> {
> 	unsigned int val;
>
>-	switch (config) {
>+	switch (config_counter(config)) {
> 	case I915_PMU_ACTUAL_FREQUENCY:
> 		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
> 		break;
>@@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
> 		return -1;
> 	}
>
>-	return I915_ENGINE_SAMPLE_COUNT + val;
>+	return I915_ENGINE_SAMPLE_COUNT +
>+	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
>+	       val;
> }
>
> static unsigned int config_bit(const u64 config)
> {
>-	if (is_engine_config(config))
>+	if (is_engine_config(config)) {
>+		GEM_BUG_ON(config_gt_id(config));
>+
> 		return engine_config_sample(config);
>-	else
>+	} else {
> 		return other_bit(config);
>+	}
> }
>
> static u64 config_mask(u64 config)
>@@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
> 	return config_bit(event->attr.config);
> }
>
>+static u64 frequency_enabled_mask(void)
>+{
>+	unsigned int i;
>+	u64 mask = 0;
>+
>+	for (i = 0; i < I915_PMU_MAX_GTS; i++)
>+		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>+			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>+
>+	return mask;
>+}
>+
> static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
> {
> 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>@@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
> 	 * Mask out all the ones which do not need the timer, or in
> 	 * other words keep all the ones that could need the timer.
> 	 */
>-	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>-		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>-		  ENGINE_SAMPLE_MASK;
>+	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>
> 	/*
> 	 * When the GPU is idle per-engine counters do not need to be
>@@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
> 	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
> }
>
>+static unsigned int
>+__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>+{
>+	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
>+
>+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
>+
>+	return idx;
>+}
>+
>+static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>+{
>+	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
>+}
>+
>+static void
>+store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
>+{
>+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
>+}
>+
>+static void
>+add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
>+{
>+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
>+}
>+
> static u64 get_rc6(struct intel_gt *gt)
> {
> 	struct drm_i915_private *i915 = gt->i915;
>+	const unsigned int gt_id = gt->info.id;
> 	struct i915_pmu *pmu = &i915->pmu;
> 	unsigned long flags;
> 	bool awake = false;
>@@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
> 	spin_lock_irqsave(&pmu->lock, flags);
>
> 	if (awake) {
>-		pmu->sample[__I915_SAMPLE_RC6].cur = val;
>+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
> 	} else {
> 		/*
> 		 * We think we are runtime suspended.
>@@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
> 		 * on top of the last known real value, as the approximated RC6
> 		 * counter value.
> 		 */
>-		val = ktime_since_raw(pmu->sleep_last);
>-		val += pmu->sample[__I915_SAMPLE_RC6].cur;
>+		val = ktime_since_raw(pmu->sleep_last[gt_id]);
>+		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
> 	}
>
>-	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
>-		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
>+	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
>+		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
> 	else
>-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
>+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
>
> 	spin_unlock_irqrestore(&pmu->lock, flags);
>
>@@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
> static void init_rc6(struct i915_pmu *pmu)
> {
> 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>-	intel_wakeref_t wakeref;
>+	struct intel_gt *gt;
>+	unsigned int i;
>+
>+	for_each_gt(gt, i915, i) {
>+		intel_wakeref_t wakeref;
>
>-	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
>-		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
>-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
>-					pmu->sample[__I915_SAMPLE_RC6].cur;
>-		pmu->sleep_last = ktime_get_raw();
>+		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>+			u64 val = __get_rc6(gt);
>+
>+			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
>+			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
>+				     val);
>+			pmu->sleep_last[i] = ktime_get_raw();
>+		}
> 	}
> }
>
>@@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
> {
> 	struct i915_pmu *pmu = &gt->i915->pmu;
>
>-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
>-	pmu->sleep_last = ktime_get_raw();
>+	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
>+	pmu->sleep_last[gt->info.id] = ktime_get_raw();
> }
>
> static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>@@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
> 	}
> }
>
>-static void
>-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
>-{
>-	sample->cur += mul_u32_u32(val, mul);
>-}
>-
>-static bool frequency_sampling_enabled(struct i915_pmu *pmu)
>+static bool
>+frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
> {
> 	return pmu->enable &
>-	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>-		config_mask(I915_PMU_REQUESTED_FREQUENCY));
>+	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
>+		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
> }
>
> static void
> frequency_sample(struct intel_gt *gt, unsigned int period_ns)
> {
> 	struct drm_i915_private *i915 = gt->i915;
>+	const unsigned int gt_id = gt->info.id;
> 	struct i915_pmu *pmu = &i915->pmu;
> 	struct intel_rps *rps = &gt->rps;
>
>-	if (!frequency_sampling_enabled(pmu))
>+	if (!frequency_sampling_enabled(pmu, gt_id))
> 		return;
>
> 	/* Report 0/0 (actual/requested) frequency while parked. */
> 	if (!intel_gt_pm_get_if_awake(gt))
> 		return;
>
>-	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
>+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
> 		u32 val;
>
> 		/*
>@@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
> 		if (!val)
> 			val = intel_gpu_freq(rps, rps->cur_freq);
>
>-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
>+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
> 				val, period_ns / 1000);
> 	}
>
>-	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
>-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
>+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
>+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
> 				intel_rps_get_requested_frequency(rps),
> 				period_ns / 1000);
> 	}
>@@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> 			continue;
>
> 		engines_sample(gt, period_ns);
>-
>-		if (i == 0) /* FIXME */
>-			frequency_sample(gt, period_ns);
>+		frequency_sample(gt, period_ns);
> 	}
>
> 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
>@@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
> {
> 	struct intel_gt *gt = to_gt(i915);
>
>-	switch (config) {
>+	unsigned int gt_id = config_gt_id(config);
>+
>+	if (gt_id)
>+		return -ENOENT;
>+
>+	switch (config_counter(config)) {
> 	case I915_PMU_ACTUAL_FREQUENCY:
> 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> 			/* Requires a mutex for sampling! */
>@@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
> 			val = engine->pmu.sample[sample].cur;
> 		}
> 	} else {
>-		switch (event->attr.config) {
>+		const unsigned int gt_id = config_gt_id(event->attr.config);
>+		const u64 config = config_counter(event->attr.config);
>+
>+		switch (config) {
> 		case I915_PMU_ACTUAL_FREQUENCY:
> 			val =
>-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
>+			   div_u64(read_sample(pmu, gt_id,
>+					       __I915_SAMPLE_FREQ_ACT),
> 				   USEC_PER_SEC /* to MHz */);
> 			break;
> 		case I915_PMU_REQUESTED_FREQUENCY:
> 			val =
>-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
>+			   div_u64(read_sample(pmu, gt_id,
>+					       __I915_SAMPLE_FREQ_REQ),
> 				   USEC_PER_SEC /* to MHz */);
> 			break;
> 		case I915_PMU_INTERRUPTS:
> 			val = READ_ONCE(pmu->irq_count);
> 			break;
> 		case I915_PMU_RC6_RESIDENCY:
>-			val = get_rc6(to_gt(i915));
>+			val = get_rc6(i915->gt[gt_id]);
> 			break;
> 		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> 			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>index 3a811266ac6a..d47846f21ddf 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.h
>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>@@ -38,13 +38,16 @@ enum {
> 	__I915_NUM_PMU_SAMPLERS
> };
>
>+#define I915_PMU_MAX_GTS (4)
>+
> /*
>  * How many different events we track in the global PMU mask.
>  *
>  * It is also used to know to needed number of event reference counters.
>  */
> #define I915_PMU_MASK_BITS \
>-	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
>+	(I915_ENGINE_SAMPLE_COUNT + \
>+	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
>
> #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
>
>@@ -124,11 +127,11 @@ struct i915_pmu {
> 	 * Only global counters are held here, while the per-engine ones are in
> 	 * struct intel_engine_cs.
> 	 */
>-	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
>+	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
> 	/**
> 	 * @sleep_last: Last time GT parked for RC6 estimation.
> 	 */
>-	ktime_t sleep_last;
>+	ktime_t sleep_last[I915_PMU_MAX_GTS];
> 	/**
> 	 * @irq_count: Number of interrupts
> 	 *
>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>index dba7c5a5b25e..d5ac1fdeb2b1 100644
>--- a/include/uapi/drm/i915_drm.h
>+++ b/include/uapi/drm/i915_drm.h
>@@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
> #define I915_PMU_ENGINE_SEMA(class, instance) \
> 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>
>-#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
>+/*
>+ * Top 4 bits of every non-engine counter are GT id.
>+ */
>+#define __I915_PMU_GT_SHIFT (60)
>+
>+#define ___I915_PMU_OTHER(gt, x) \
>+	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>+	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>+
>+#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>
> #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
> #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
>@@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
>
> #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>
>+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
>+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
>+#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
>+#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
>+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
>+
> /* Each region is a minimum of 16k, and there are at most 255 of them.
>  */
> #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
@ 2023-05-08 18:08   ` Umesh Nerlige Ramappa
  2023-05-09 12:38   ` Tvrtko Ursulin
  2023-05-11 18:57   ` Dixit, Ashutosh
  2 siblings, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-08 18:08 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Fri, May 05, 2023 at 05:58:16PM -0700, Umesh Nerlige Ramappa wrote:
>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>Start exporting frequency and RC6 counters from all tiles.
>
>Existing counters keep their names and config values and new one use the
>namespace added in the previous patch, with the "-gtN" added to their
>names.
>
>Interrupts counter is an odd one off. Because it is the global device
>counters (not only GT) we choose not to add per tile versions for now.

UMD specific changes link needed here as well. With that:

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks,
Umesh
>
>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/i915/i915_pmu.c | 87 ++++++++++++++++++++++-----------
> 1 file changed, 59 insertions(+), 28 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 12b2f3169abf..284e5c5b97bb 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
> 	struct intel_gt *gt = to_gt(i915);
>
> 	unsigned int gt_id = config_gt_id(config);
>+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>
>-	if (gt_id)
>+	if (gt_id > max_gt_id)
> 		return -ENOENT;
>
> 	switch (config_counter(config)) {
>@@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
> 			return -ENODEV;
> 		break;
> 	case I915_PMU_INTERRUPTS:
>+		if (gt_id)
>+			return -ENOENT;
> 		break;
> 	case I915_PMU_RC6_RESIDENCY:
> 		if (!gt->rc6.supported)
>@@ -930,11 +933,20 @@ static const struct attribute_group i915_pmu_cpumask_attr_group = {
> 	.attrs = i915_cpumask_attrs,
> };
>
>-#define __event(__config, __name, __unit) \
>+#define __event(__counter, __name, __unit) \
> { \
>-	.config = (__config), \
>+	.counter = (__counter), \
> 	.name = (__name), \
> 	.unit = (__unit), \
>+	.global = false, \
>+}
>+
>+#define __global_event(__counter, __name, __unit) \
>+{ \
>+	.counter = (__counter), \
>+	.name = (__name), \
>+	.unit = (__unit), \
>+	.global = true, \
> }
>
> #define __engine_event(__sample, __name) \
>@@ -973,15 +985,16 @@ create_event_attributes(struct i915_pmu *pmu)
> {
> 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> 	static const struct {
>-		u64 config;
>+		unsigned int counter;
> 		const char *name;
> 		const char *unit;
>+		bool global;
> 	} events[] = {
>-		__event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
>-		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
>-		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
>-		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
>-		__event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, "software-gt-awake-time", "ns"),
>+		__event(0, "actual-frequency", "M"),
>+		__event(1, "requested-frequency", "M"),
>+		__global_event(2, "interrupts", NULL),
>+		__event(3, "rc6-residency", "ns"),
>+		__event(4, "software-gt-awake-time", "ns"),
> 	};
> 	static const struct {
> 		enum drm_i915_pmu_engine_sample sample;
>@@ -996,12 +1009,17 @@ create_event_attributes(struct i915_pmu *pmu)
> 	struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
> 	struct attribute **attr = NULL, **attr_iter;
> 	struct intel_engine_cs *engine;
>-	unsigned int i;
>+	struct intel_gt *gt;
>+	unsigned int i, j;
>
> 	/* Count how many counters we will be exposing. */
>-	for (i = 0; i < ARRAY_SIZE(events); i++) {
>-		if (!config_status(i915, events[i].config))
>-			count++;
>+	for_each_gt(gt, i915, j) {
>+		for (i = 0; i < ARRAY_SIZE(events); i++) {
>+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>+
>+			if (!config_status(i915, config))
>+				count++;
>+		}
> 	}
>
> 	for_each_uabi_engine(engine, i915) {
>@@ -1031,26 +1049,39 @@ create_event_attributes(struct i915_pmu *pmu)
> 	attr_iter = attr;
>
> 	/* Initialize supported non-engine counters. */
>-	for (i = 0; i < ARRAY_SIZE(events); i++) {
>-		char *str;
>-
>-		if (config_status(i915, events[i].config))
>-			continue;
>-
>-		str = kstrdup(events[i].name, GFP_KERNEL);
>-		if (!str)
>-			goto err;
>+	for_each_gt(gt, i915, j) {
>+		for (i = 0; i < ARRAY_SIZE(events); i++) {
>+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>+			char *str;
>
>-		*attr_iter++ = &i915_iter->attr.attr;
>-		i915_iter = add_i915_attr(i915_iter, str, events[i].config);
>+			if (config_status(i915, config))
>+				continue;
>
>-		if (events[i].unit) {
>-			str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
>+			if (events[i].global || !HAS_EXTRA_GT_LIST(i915))
>+				str = kstrdup(events[i].name, GFP_KERNEL);
>+			else
>+				str = kasprintf(GFP_KERNEL, "%s-gt%u",
>+						events[i].name, j);
> 			if (!str)
> 				goto err;
>
>-			*attr_iter++ = &pmu_iter->attr.attr;
>-			pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
>+			*attr_iter++ = &i915_iter->attr.attr;
>+			i915_iter = add_i915_attr(i915_iter, str, config);
>+
>+			if (events[i].unit) {
>+				if (events[i].global || !HAS_EXTRA_GT_LIST(i915))
>+					str = kasprintf(GFP_KERNEL, "%s.unit",
>+							events[i].name);
>+				else
>+					str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
>+							events[i].name, j);
>+				if (!str)
>+					goto err;
>+
>+				*attr_iter++ = &pmu_iter->attr.attr;
>+				pmu_iter = add_pmu_attr(pmu_iter, str,
>+							events[i].unit);
>+			}
> 		}
> 	}
>
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines
  2023-05-08 17:52   ` Umesh Nerlige Ramappa
@ 2023-05-09 12:26     ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-09 12:26 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx, Ashutosh Dixit


On 08/05/2023 18:52, Umesh Nerlige Ramappa wrote:
> On Fri, May 05, 2023 at 05:58:11PM -0700, Umesh Nerlige Ramappa wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Given how the metrics are already exported, we also need to run sampling
>> over engines from all GTs.
>>
>> Problem of GT frequencies is left for later.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_pmu.c | 13 ++++++++++---
>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
>> b/drivers/gpu/drm/i915/i915_pmu.c
>> index 7ece883a7d95..67fa6cd77529 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -10,6 +10,7 @@
>> #include "gt/intel_engine_pm.h"
>> #include "gt/intel_engine_regs.h"
>> #include "gt/intel_engine_user.h"
>> +#include "gt/intel_gt.h"
>> #include "gt/intel_gt_pm.h"
>> #include "gt/intel_gt_regs.h"
>> #include "gt/intel_rc6.h"
>> @@ -414,8 +415,9 @@ static enum hrtimer_restart i915_sample(struct 
>> hrtimer *hrtimer)
>>     struct drm_i915_private *i915 =
>>         container_of(hrtimer, struct drm_i915_private, pmu.timer);
>>     struct i915_pmu *pmu = &i915->pmu;
>> -    struct intel_gt *gt = to_gt(i915);
>>     unsigned int period_ns;
>> +    struct intel_gt *gt;
>> +    unsigned int i;
>>     ktime_t now;
>>
>>     if (!READ_ONCE(pmu->timer_enabled))
>> @@ -431,8 +433,13 @@ static enum hrtimer_restart i915_sample(struct 
>> hrtimer *hrtimer)
>>      * grabbing the forcewake. However the potential error from timer 
>> call-
>>      * back delay greatly dominates this so we keep it simple.
>>      */
>> -    engines_sample(gt, period_ns);
>> -    frequency_sample(gt, period_ns);
>> +
>> +    for_each_gt(gt, i915, i) {
>> +        engines_sample(gt, period_ns);
>> +
>> +        if (i == 0) /* FIXME */
>> +            frequency_sample(gt, period_ns);
> 
> If the current series is already handling the FIXME at a later patch, I 
> would just change this to a comment - /* Support gt0 for now */
> 
> With or without that, this is
> 
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> 
> @Tvrtko, Note that I am only transporting the patches (unmodified) from 
> internal to upstream, so assuming I am still a valid reviewer. If not, 
> let me know.

I think that is okay.

More of a problem is when you make comments like the above "I would just 
change" - the question then is are you expecting me to make that change? 
;) I think it would be best if you handled such tweaks in the series. In 
this particular patch it is probably not really required since it gets 
overwritten later as you say. It's probably just a left-over untidiness 
from "back in the day".

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
  2023-05-08 18:08   ` Umesh Nerlige Ramappa
@ 2023-05-09 12:38   ` Tvrtko Ursulin
  2023-05-11 18:57   ` Dixit, Ashutosh
  2 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-09 12:38 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx, Ashutosh Dixit


On 06/05/2023 01:58, Umesh Nerlige Ramappa wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Start exporting frequency and RC6 counters from all tiles.
> 
> Existing counters keep their names and config values and new one use the
> namespace added in the previous patch, with the "-gtN" added to their
> names.
> 
> Interrupts counter is an odd one off. Because it is the global device
> counters (not only GT) we choose not to add per tile versions for now.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 87 ++++++++++++++++++++++-----------
>   1 file changed, 59 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 12b2f3169abf..284e5c5b97bb 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
>   	struct intel_gt *gt = to_gt(i915);
>   
>   	unsigned int gt_id = config_gt_id(config);
> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>   
> -	if (gt_id)
> +	if (gt_id > max_gt_id)

Would this be clearer as:

unsigned in num_gts = 1 + (unsigned int)HAS_EXTRA_GT_LIST(i915);

if (gt_id >= num_gts)

?

Just thinking out loud, no real opinion either way.

>   		return -ENOENT;
>   
>   	switch (config_counter(config)) {
> @@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>   			return -ENODEV;
>   		break;
>   	case I915_PMU_INTERRUPTS:
> +		if (gt_id)
> +			return -ENOENT;
>   		break;
>   	case I915_PMU_RC6_RESIDENCY:
>   		if (!gt->rc6.supported)
> @@ -930,11 +933,20 @@ static const struct attribute_group i915_pmu_cpumask_attr_group = {
>   	.attrs = i915_cpumask_attrs,
>   };
>   
> -#define __event(__config, __name, __unit) \
> +#define __event(__counter, __name, __unit) \
>   { \
> -	.config = (__config), \
> +	.counter = (__counter), \
>   	.name = (__name), \
>   	.unit = (__unit), \
> +	.global = false, \
> +}
> +
> +#define __global_event(__counter, __name, __unit) \
> +{ \
> +	.counter = (__counter), \
> +	.name = (__name), \
> +	.unit = (__unit), \
> +	.global = true, \
>   }
>   
>   #define __engine_event(__sample, __name) \
> @@ -973,15 +985,16 @@ create_event_attributes(struct i915_pmu *pmu)
>   {
>   	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>   	static const struct {
> -		u64 config;
> +		unsigned int counter;
>   		const char *name;
>   		const char *unit;
> +		bool global;
>   	} events[] = {
> -		__event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
> -		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
> -		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
> -		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
> -		__event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, "software-gt-awake-time", "ns"),
> +		__event(0, "actual-frequency", "M"),
> +		__event(1, "requested-frequency", "M"),
> +		__global_event(2, "interrupts", NULL),
> +		__event(3, "rc6-residency", "ns"),
> +		__event(4, "software-gt-awake-time", "ns"),
>   	};
>   	static const struct {
>   		enum drm_i915_pmu_engine_sample sample;
> @@ -996,12 +1009,17 @@ create_event_attributes(struct i915_pmu *pmu)
>   	struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
>   	struct attribute **attr = NULL, **attr_iter;
>   	struct intel_engine_cs *engine;
> -	unsigned int i;
> +	struct intel_gt *gt;
> +	unsigned int i, j;
>   
>   	/* Count how many counters we will be exposing. */
> -	for (i = 0; i < ARRAY_SIZE(events); i++) {
> -		if (!config_status(i915, events[i].config))
> -			count++;
> +	for_each_gt(gt, i915, j) {
> +		for (i = 0; i < ARRAY_SIZE(events); i++) {
> +			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
> +
> +			if (!config_status(i915, config))
> +				count++;
> +		}
>   	}
>   
>   	for_each_uabi_engine(engine, i915) {
> @@ -1031,26 +1049,39 @@ create_event_attributes(struct i915_pmu *pmu)
>   	attr_iter = attr;
>   
>   	/* Initialize supported non-engine counters. */
> -	for (i = 0; i < ARRAY_SIZE(events); i++) {
> -		char *str;
> -
> -		if (config_status(i915, events[i].config))
> -			continue;
> -
> -		str = kstrdup(events[i].name, GFP_KERNEL);
> -		if (!str)
> -			goto err;
> +	for_each_gt(gt, i915, j) {
> +		for (i = 0; i < ARRAY_SIZE(events); i++) {
> +			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
> +			char *str;
>   
> -		*attr_iter++ = &i915_iter->attr.attr;
> -		i915_iter = add_i915_attr(i915_iter, str, events[i].config);
> +			if (config_status(i915, config))
> +				continue;
>   
> -		if (events[i].unit) {
> -			str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
> +			if (events[i].global || !HAS_EXTRA_GT_LIST(i915))
> +				str = kstrdup(events[i].name, GFP_KERNEL);
> +			else
> +				str = kasprintf(GFP_KERNEL, "%s-gt%u",
> +						events[i].name, j);
>   			if (!str)
>   				goto err;
>   
> -			*attr_iter++ = &pmu_iter->attr.attr;
> -			pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
> +			*attr_iter++ = &i915_iter->attr.attr;
> +			i915_iter = add_i915_attr(i915_iter, str, config);
> +
> +			if (events[i].unit) {
> +				if (events[i].global || !HAS_EXTRA_GT_LIST(i915))

Maybe worth moving the condition to for_each_gt?

   bool use_gt_suffix = HAS_EXTRA_GT_LIST(i915) && !events[i].global;

Again, more questionable bike shedding.

Regards,

Tvrtko

> +					str = kasprintf(GFP_KERNEL, "%s.unit",
> +							events[i].name);
> +				else
> +					str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
> +							events[i].name, j);
> +				if (!str)
> +					goto err;
> +
> +				*attr_iter++ = &pmu_iter->attr.attr;
> +				pmu_iter = add_pmu_attr(pmu_iter, str,
> +							events[i].unit);
> +			}
>   		}
>   	}
>   

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based
  2023-05-08 17:55   ` Umesh Nerlige Ramappa
@ 2023-05-09 15:10     ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-09 15:10 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

On Mon, May 08, 2023 at 10:55:01AM -0700, Umesh Nerlige Ramappa wrote:
>On Fri, May 05, 2023 at 05:58:13PM -0700, Umesh Nerlige Ramappa wrote:
>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>>Trivial prep work for full multi-tile enablement later.
>
>Some more description on what this does OR how park/unpark affects pmu 
>counters would help.

Described later, so

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

>
>Thanks,
>Umesh
>
>>
>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>---
>>drivers/gpu/drm/i915/gt/intel_gt_pm.c |  4 ++--
>>drivers/gpu/drm/i915/i915_pmu.c       | 16 ++++++++--------
>>drivers/gpu/drm/i915/i915_pmu.h       |  9 +++++----
>>3 files changed, 15 insertions(+), 14 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>>index e02cb90723ae..c2e69bafd02b 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>>@@ -87,7 +87,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
>>
>>	intel_rc6_unpark(&gt->rc6);
>>	intel_rps_unpark(&gt->rps);
>>-	i915_pmu_gt_unparked(i915);
>>+	i915_pmu_gt_unparked(gt);
>>	intel_guc_busyness_unpark(gt);
>>
>>	intel_gt_unpark_requests(gt);
>>@@ -109,7 +109,7 @@ static int __gt_park(struct intel_wakeref *wf)
>>
>>	intel_guc_busyness_park(gt);
>>	i915_vma_parked(gt);
>>-	i915_pmu_gt_parked(i915);
>>+	i915_pmu_gt_parked(gt);
>>	intel_rps_park(&gt->rps);
>>	intel_rc6_park(&gt->rc6);
>>
>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>index ba769f7fc385..2b63ee31e1b3 100644
>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>@@ -217,11 +217,11 @@ static void init_rc6(struct i915_pmu *pmu)
>>	}
>>}
>>
>>-static void park_rc6(struct drm_i915_private *i915)
>>+static void park_rc6(struct intel_gt *gt)
>>{
>>-	struct i915_pmu *pmu = &i915->pmu;
>>+	struct i915_pmu *pmu = &gt->i915->pmu;
>>
>>-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
>>+	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
>>	pmu->sleep_last = ktime_get_raw();
>>}
>>
>>@@ -236,16 +236,16 @@ static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>>	}
>>}
>>
>>-void i915_pmu_gt_parked(struct drm_i915_private *i915)
>>+void i915_pmu_gt_parked(struct intel_gt *gt)
>>{
>>-	struct i915_pmu *pmu = &i915->pmu;
>>+	struct i915_pmu *pmu = &gt->i915->pmu;
>>
>>	if (!pmu->base.event_init)
>>		return;
>>
>>	spin_lock_irq(&pmu->lock);
>>
>>-	park_rc6(i915);
>>+	park_rc6(gt);
>>
>>	/*
>>	 * Signal sampling timer to stop if only engine events are enabled and
>>@@ -256,9 +256,9 @@ void i915_pmu_gt_parked(struct drm_i915_private *i915)
>>	spin_unlock_irq(&pmu->lock);
>>}
>>
>>-void i915_pmu_gt_unparked(struct drm_i915_private *i915)
>>+void i915_pmu_gt_unparked(struct intel_gt *gt)
>>{
>>-	struct i915_pmu *pmu = &i915->pmu;
>>+	struct i915_pmu *pmu = &gt->i915->pmu;
>>
>>	if (!pmu->base.event_init)
>>		return;
>>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>>index c30f43319a78..a686fd7ccedf 100644
>>--- a/drivers/gpu/drm/i915/i915_pmu.h
>>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>>@@ -13,6 +13,7 @@
>>#include <uapi/drm/i915_drm.h>
>>
>>struct drm_i915_private;
>>+struct intel_gt;
>>
>>/*
>> * Non-engine events that we need to track enabled-disabled transition and
>>@@ -151,15 +152,15 @@ int i915_pmu_init(void);
>>void i915_pmu_exit(void);
>>void i915_pmu_register(struct drm_i915_private *i915);
>>void i915_pmu_unregister(struct drm_i915_private *i915);
>>-void i915_pmu_gt_parked(struct drm_i915_private *i915);
>>-void i915_pmu_gt_unparked(struct drm_i915_private *i915);
>>+void i915_pmu_gt_parked(struct intel_gt *gt);
>>+void i915_pmu_gt_unparked(struct intel_gt *gt);
>>#else
>>static inline int i915_pmu_init(void) { return 0; }
>>static inline void i915_pmu_exit(void) {}
>>static inline void i915_pmu_register(struct drm_i915_private *i915) {}
>>static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
>>-static inline void i915_pmu_gt_parked(struct drm_i915_private *i915) {}
>>-static inline void i915_pmu_gt_unparked(struct drm_i915_private *i915) {}
>>+static inline void i915_pmu_gt_parked(struct intel_gt *gt) {}
>>+static inline void i915_pmu_gt_unparked(struct intel_gt *gt) {}
>>#endif
>>
>>#endif
>>-- 
>>2.36.1
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
  2023-05-08 17:58   ` Umesh Nerlige Ramappa
@ 2023-05-09 17:25   ` Dixit, Ashutosh
  2023-05-10  6:02     ` Dixit, Ashutosh
  2023-05-12 22:29   ` Dixit, Ashutosh
  2 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-09 17:25 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> We do not want to have timers per tile and waste CPU cycles and energy via
> multiple wake-up sources, for a relatively un-important task of PMU
> sampling, so keeping a single timer works well. But we also do not want
> the first GT which goes idle to turn off the timer.

Apart from this efficiency, what is the reason for having a device level
PMU (which monitors gt level events), rather than independent gt level
PMU's (each of which monitor events from that gt)?

Wouldn't independent gt level PMU's be simpler? And user space tools (say
intel-gpu-top) would hook into events from a gt and treat each gt
independently?

So my question really is what is the reason for keeping the PMU device
level rather than per gt?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-09 17:25   ` Dixit, Ashutosh
@ 2023-05-10  6:02     ` Dixit, Ashutosh
  0 siblings, 0 replies; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-10  6:02 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 09 May 2023 10:25:16 -0700, Dixit, Ashutosh wrote:
>
> On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
> >
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > We do not want to have timers per tile and waste CPU cycles and energy via
> > multiple wake-up sources, for a relatively un-important task of PMU
> > sampling, so keeping a single timer works well. But we also do not want
> > the first GT which goes idle to turn off the timer.
>
> Apart from this efficiency, what is the reason for having a device level
> PMU (which monitors gt level events), rather than independent gt level
> PMU's (each of which monitor events from that gt)?
>
> Wouldn't independent gt level PMU's be simpler? And user space tools (say
> intel-gpu-top) would hook into events from a gt and treat each gt
> independently?
>
> So my question really is what is the reason for keeping the PMU device
> level rather than per gt?

Maybe ignore this for now, the way it is expressed it is too open
ended. Let me get a better handle on the code and the patches and I'll see
if I have anything to say.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
  2023-05-08 18:08   ` Umesh Nerlige Ramappa
  2023-05-09 12:38   ` Tvrtko Ursulin
@ 2023-05-11 18:57   ` Dixit, Ashutosh
  2023-05-12 10:57     ` Tvrtko Ursulin
  2 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-11 18:57 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 05 May 2023 17:58:16 -0700, Umesh Nerlige Ramappa wrote:
>

One drive-by comment:

> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 12b2f3169abf..284e5c5b97bb 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
>	struct intel_gt *gt = to_gt(i915);
>
>	unsigned int gt_id = config_gt_id(config);
> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;

But in Patch 5 we have:

#define I915_PMU_MAX_GTS (4)

>
> -	if (gt_id)
> +	if (gt_id > max_gt_id)
>		return -ENOENT;
>
>	switch (config_counter(config)) {
> @@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>			return -ENODEV;
>		break;
>	case I915_PMU_INTERRUPTS:
> +		if (gt_id)
> +			return -ENOENT;
>		break;
>	case I915_PMU_RC6_RESIDENCY:
>		if (!gt->rc6.supported)

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
  2023-05-08 18:07   ` Umesh Nerlige Ramappa
@ 2023-05-12  1:08   ` Dixit, Ashutosh
  2023-05-12 10:56     ` Tvrtko Ursulin
  1 sibling, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-12  1:08 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Reserve some bits in the counter config namespace which will carry the
> tile id and prepare the code to handle this.
>
> No per tile counters have been added yet.
>
> v2:
> - Fix checkpatch issues
> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
> - Set MAX GTs to 4. Drop FIXME.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
>  drivers/gpu/drm/i915/i915_pmu.h |   9 +-
>  include/uapi/drm/i915_drm.h     |  17 +++-
>  3 files changed, 129 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 669a42e44082..12b2f3169abf 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
>	return config < __I915_PMU_OTHER(0);
>  }
>
> +static unsigned int config_gt_id(const u64 config)
> +{
> +	return config >> __I915_PMU_GT_SHIFT;
> +}
> +
> +static u64 config_counter(const u64 config)
> +{
> +	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);

ok, but another possibility:

	return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT);

> +}
> +
>  static unsigned int other_bit(const u64 config)
>  {
>	unsigned int val;
>
> -	switch (config) {
> +	switch (config_counter(config)) {
>	case I915_PMU_ACTUAL_FREQUENCY:
>		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
>		break;
> @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
>		return -1;
>	}
>
> -	return I915_ENGINE_SAMPLE_COUNT + val;
> +	return I915_ENGINE_SAMPLE_COUNT +
> +	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
> +	       val;
>  }
>
>  static unsigned int config_bit(const u64 config)
>  {
> -	if (is_engine_config(config))
> +	if (is_engine_config(config)) {
> +		GEM_BUG_ON(config_gt_id(config));

This GEM_BUG_ON is not needed since:

	static bool is_engine_config(u64 config)
	{
	        return config < __I915_PMU_OTHER(0);
	}

> +
>		return engine_config_sample(config);
> -	else
> +	} else {
>		return other_bit(config);
> +	}
>  }
>
>  static u64 config_mask(u64 config)
> @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
>	return config_bit(event->attr.config);
>  }
>
> +static u64 frequency_enabled_mask(void)
> +{
> +	unsigned int i;
> +	u64 mask = 0;
> +
> +	for (i = 0; i < I915_PMU_MAX_GTS; i++)
> +		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
> +			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
> +
> +	return mask;
> +}
> +
>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>  {
>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>	 * Mask out all the ones which do not need the timer, or in
>	 * other words keep all the ones that could need the timer.
>	 */
> -	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> -		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
> -		  ENGINE_SAMPLE_MASK;
> +	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;

u32 enable & u64 frequency_enabled_mask

ugly but ok I guess? Or change enable to u64?

>
>	/*
>	 * When the GPU is idle per-engine counters do not need to be
> @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
>	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
>  }
>
> +static unsigned int
> +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
> +{
> +	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
> +
> +	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));

Does this GEM_BUG_ON need to be split up as follows:

	GEM_BUG_ON(gt_id >= I915_PMU_MAX_GTS);
	GEM_BUG_ON(sample >= __I915_NUM_PMU_SAMPLERS);

Since that is what we really mean here isn't it?

> +
> +	return idx;
> +}
> +
> +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
> +{
> +	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
> +}
> +
> +static void
> +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
> +{
> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
> +}
> +
> +static void
> +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
> +{
> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
> +}

Gripe: I think this code should have per event data structures which store
all information about a particular event. Rather than storing it in these
arrays common to all events (and in bit-fields common to all events) which
results in the kind of dance we have to do here. Anyway too big a change to
make now but something to consider if we ever do this for xe.

> +
>  static u64 get_rc6(struct intel_gt *gt)
>  {
>	struct drm_i915_private *i915 = gt->i915;
> +	const unsigned int gt_id = gt->info.id;
>	struct i915_pmu *pmu = &i915->pmu;
>	unsigned long flags;
>	bool awake = false;
> @@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
>	spin_lock_irqsave(&pmu->lock, flags);
>
>	if (awake) {
> -		pmu->sample[__I915_SAMPLE_RC6].cur = val;
> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>	} else {
>		/*
>		 * We think we are runtime suspended.
> @@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
>		 * on top of the last known real value, as the approximated RC6
>		 * counter value.
>		 */
> -		val = ktime_since_raw(pmu->sleep_last);
> -		val += pmu->sample[__I915_SAMPLE_RC6].cur;
> +		val = ktime_since_raw(pmu->sleep_last[gt_id]);
> +		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
>	}
>
> -	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
> -		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
> +	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
> +		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
>	else
> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
>
>	spin_unlock_irqrestore(&pmu->lock, flags);
>
> @@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
>  static void init_rc6(struct i915_pmu *pmu)
>  {
>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> -	intel_wakeref_t wakeref;
> +	struct intel_gt *gt;
> +	unsigned int i;
> +
> +	for_each_gt(gt, i915, i) {
> +		intel_wakeref_t wakeref;
>
> -	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
> -		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
> -					pmu->sample[__I915_SAMPLE_RC6].cur;
> -		pmu->sleep_last = ktime_get_raw();
> +		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
> +			u64 val = __get_rc6(gt);
> +
> +			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
> +			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
> +				     val);
> +			pmu->sleep_last[i] = ktime_get_raw();
> +		}
>	}
>  }
>
> @@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
>  {
>	struct i915_pmu *pmu = &gt->i915->pmu;
>
> -	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
> -	pmu->sleep_last = ktime_get_raw();
> +	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
> +	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>  }
>
>  static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
> @@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
>	}
>  }
>
> -static void
> -add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
> -{
> -	sample->cur += mul_u32_u32(val, mul);
> -}
> -
> -static bool frequency_sampling_enabled(struct i915_pmu *pmu)
> +static bool
> +frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
>  {
>	return pmu->enable &
> -	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> -		config_mask(I915_PMU_REQUESTED_FREQUENCY));
> +	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
> +		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));

Here again:

	u32 pmu->enable & u64 config_mask

Probably ok?

And also in i915_pmu_enable() we have:

	pmu->enable |= BIT_ULL(bit);

So change pmu->enable to u64?

>  }
>
>  static void
>  frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>  {
>	struct drm_i915_private *i915 = gt->i915;
> +	const unsigned int gt_id = gt->info.id;
>	struct i915_pmu *pmu = &i915->pmu;
>	struct intel_rps *rps = &gt->rps;
>
> -	if (!frequency_sampling_enabled(pmu))
> +	if (!frequency_sampling_enabled(pmu, gt_id))
>		return;

Pre-existing issue, but why do we need this check? This is already checked
in the two individual checks for actual and requested freq below:

	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id)))

	and

	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) 

So can we delete frequency_sampling_enabled()? Or is it there to avoid the
overhead of intel_gt_pm_get_if_awake() (which doesn't seem to be much)?

>
>	/* Report 0/0 (actual/requested) frequency while parked. */
>	if (!intel_gt_pm_get_if_awake(gt))
>		return;
>
> -	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> +	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>		u32 val;
>
>		/*
> @@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>		if (!val)
>			val = intel_gpu_freq(rps, rps->cur_freq);
>
> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
>				val, period_ns / 1000);
>	}
>
> -	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
> +	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>				intel_rps_get_requested_frequency(rps),
>				period_ns / 1000);
>	}
> @@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>			continue;
>
>		engines_sample(gt, period_ns);
> -
> -		if (i == 0) /* FIXME */
> -			frequency_sample(gt, period_ns);
> +		frequency_sample(gt, period_ns);
>	}
>
>	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
> @@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
>  {
>	struct intel_gt *gt = to_gt(i915);
>
> -	switch (config) {
> +	unsigned int gt_id = config_gt_id(config);
> +
> +	if (gt_id)
> +		return -ENOENT;

This is just wrong. It is fixed in the next patch:

	if (gt_id > max_gt_id)
		return -ENOENT;

But probably should be fixed in this patch itself. Or dropped from this
patch and let it come in in Patch 6, since it's confusing. Though it
probably belongs in this patch.

> +
> +	switch (config_counter(config)) {
>	case I915_PMU_ACTUAL_FREQUENCY:
>		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>			/* Requires a mutex for sampling! */
> @@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>			val = engine->pmu.sample[sample].cur;
>		}
>	} else {
> -		switch (event->attr.config) {
> +		const unsigned int gt_id = config_gt_id(event->attr.config);
> +		const u64 config = config_counter(event->attr.config);
> +
> +		switch (config) {
>		case I915_PMU_ACTUAL_FREQUENCY:
>			val =
> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
> +			   div_u64(read_sample(pmu, gt_id,
> +					       __I915_SAMPLE_FREQ_ACT),
>				   USEC_PER_SEC /* to MHz */);
>			break;
>		case I915_PMU_REQUESTED_FREQUENCY:
>			val =
> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
> +			   div_u64(read_sample(pmu, gt_id,
> +					       __I915_SAMPLE_FREQ_REQ),
>				   USEC_PER_SEC /* to MHz */);
>			break;
>		case I915_PMU_INTERRUPTS:
>			val = READ_ONCE(pmu->irq_count);
>			break;
>		case I915_PMU_RC6_RESIDENCY:
> -			val = get_rc6(to_gt(i915));
> +			val = get_rc6(i915->gt[gt_id]);
>			break;
>		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> index 3a811266ac6a..d47846f21ddf 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.h
> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> @@ -38,13 +38,16 @@ enum {
>	__I915_NUM_PMU_SAMPLERS
>  };
>
> +#define I915_PMU_MAX_GTS (4)

4 or (4)? :-)

> +
>  /*
>   * How many different events we track in the global PMU mask.
>   *
>   * It is also used to know to needed number of event reference counters.
>   */
>  #define I915_PMU_MASK_BITS \
> -	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
> +	(I915_ENGINE_SAMPLE_COUNT + \
> +	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
>
>  #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
>
> @@ -124,11 +127,11 @@ struct i915_pmu {
>	 * Only global counters are held here, while the per-engine ones are in
>	 * struct intel_engine_cs.
>	 */
> -	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
> +	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>	/**
>	 * @sleep_last: Last time GT parked for RC6 estimation.
>	 */
> -	ktime_t sleep_last;
> +	ktime_t sleep_last[I915_PMU_MAX_GTS];
>	/**
>	 * @irq_count: Number of interrupts
>	 *
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index dba7c5a5b25e..d5ac1fdeb2b1 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
>  #define I915_PMU_ENGINE_SEMA(class, instance) \
>	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>
> -#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
> +/*
> + * Top 4 bits of every non-engine counter are GT id.
> + */
> +#define __I915_PMU_GT_SHIFT (60)

REG_GENMASK64 or GENMASK_ULL would be nicer but of course we can't put in
the uapi header, so ok.

> +
> +#define ___I915_PMU_OTHER(gt, x) \
> +	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
> +	((__u64)(gt) << __I915_PMU_GT_SHIFT))
> +
> +#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>
>  #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>  #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
> @@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
>
>  #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>
> +#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
> +#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
> +#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
> +#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
> +#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
> +
>  /* Each region is a minimum of 16k, and there are at most 255 of them.
>   */
>  #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
> --
> 2.36.1
>

Above comments are mostly nits so after addressing the above comments, this
is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-12  1:08   ` Dixit, Ashutosh
@ 2023-05-12 10:56     ` Tvrtko Ursulin
  2023-05-12 20:57       ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-12 10:56 UTC (permalink / raw)
  To: Dixit, Ashutosh, Umesh Nerlige Ramappa; +Cc: intel-gfx


On 12/05/2023 02:08, Dixit, Ashutosh wrote:
> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Reserve some bits in the counter config namespace which will carry the
>> tile id and prepare the code to handle this.
>>
>> No per tile counters have been added yet.
>>
>> v2:
>> - Fix checkpatch issues
>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
>> - Set MAX GTs to 4. Drop FIXME.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
>>   drivers/gpu/drm/i915/i915_pmu.h |   9 +-
>>   include/uapi/drm/i915_drm.h     |  17 +++-
>>   3 files changed, 129 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index 669a42e44082..12b2f3169abf 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
>> 	return config < __I915_PMU_OTHER(0);
>>   }
>>
>> +static unsigned int config_gt_id(const u64 config)
>> +{
>> +	return config >> __I915_PMU_GT_SHIFT;
>> +}
>> +
>> +static u64 config_counter(const u64 config)
>> +{
>> +	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
> 
> ok, but another possibility:
> 
> 	return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT);

It's not a register so no. :) GENMASK_ULL maybe but meh.

>> +}
>> +
>>   static unsigned int other_bit(const u64 config)
>>   {
>> 	unsigned int val;
>>
>> -	switch (config) {
>> +	switch (config_counter(config)) {
>> 	case I915_PMU_ACTUAL_FREQUENCY:
>> 		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
>> 		break;
>> @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
>> 		return -1;
>> 	}
>>
>> -	return I915_ENGINE_SAMPLE_COUNT + val;
>> +	return I915_ENGINE_SAMPLE_COUNT +
>> +	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
>> +	       val;
>>   }
>>
>>   static unsigned int config_bit(const u64 config)
>>   {
>> -	if (is_engine_config(config))
>> +	if (is_engine_config(config)) {
>> +		GEM_BUG_ON(config_gt_id(config));
> 
> This GEM_BUG_ON is not needed since:
> 
> 	static bool is_engine_config(u64 config)
> 	{
> 	        return config < __I915_PMU_OTHER(0);
> 	}

True!

>> +
>> 		return engine_config_sample(config);
>> -	else
>> +	} else {
>> 		return other_bit(config);
>> +	}
>>   }
>>
>>   static u64 config_mask(u64 config)
>> @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
>> 	return config_bit(event->attr.config);
>>   }
>>
>> +static u64 frequency_enabled_mask(void)
>> +{
>> +	unsigned int i;
>> +	u64 mask = 0;
>> +
>> +	for (i = 0; i < I915_PMU_MAX_GTS; i++)
>> +		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>> +			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>> +
>> +	return mask;
>> +}
>> +
>>   static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>   {
>> 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>> 	 * Mask out all the ones which do not need the timer, or in
>> 	 * other words keep all the ones that could need the timer.
>> 	 */
>> -	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>> -		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>> -		  ENGINE_SAMPLE_MASK;
>> +	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
> 
> u32 enable & u64 frequency_enabled_mask
> 
> ugly but ok I guess? Or change enable to u64?

Hmm.. yes very ugly. Could have been an accident which happened to work 
because there is a single timer (not per tile).

Similar issue in frequency_sampling_enabled too. Gt_id argument to it 
seems pointless.

So I now think whole frequency_enabled_mask() is just pointless and 
should be removed. And then pmu_needs_time code can stay as is. Possibly 
add a config_mask_32 helper which ensures no bits in upper 32 bits are 
returned.

That is if we are happy for the frequency_sampling_enabled returning 
true for all gts, regardless of which ones actually have frequency 
sampling enabled.

Or if we want to implement it as I probably have intended, we will need 
to add some gt bits into pmu->enable. Maybe reserve top four same as 
with config counters.

In this case the config_mask needs to be updated to translate not just 
the config counter into the pmu tracked event bits, but config counter 
gt id into the pmu->enabled gt id.

Sounds easily doable on a first thought.

> 
>>
>> 	/*
>> 	 * When the GPU is idle per-engine counters do not need to be
>> @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
>> 	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
>>   }
>>
>> +static unsigned int
>> +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>> +{
>> +	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
>> +
>> +	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
> 
> Does this GEM_BUG_ON need to be split up as follows:
> 
> 	GEM_BUG_ON(gt_id >= I915_PMU_MAX_GTS);
> 	GEM_BUG_ON(sample >= __I915_NUM_PMU_SAMPLERS);
> 
> Since that is what we really mean here isn't it?

ARRAY_SIZE check seems the safest option to me, given it is defined as:

  sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];

What problem do you see here?

>> +
>> +	return idx;
>> +}
>> +
>> +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>> +{
>> +	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
>> +}
>> +
>> +static void
>> +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
>> +{
>> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
>> +}
>> +
>> +static void
>> +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
>> +{
>> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
>> +}
> 
> Gripe: I think this code should have per event data structures which store
> all information about a particular event. Rather than storing it in these
> arrays common to all events (and in bit-fields common to all events) which
> results in the kind of dance we have to do here. Anyway too big a change to
> make now but something to consider if we ever do this for xe.

Could do a two dimensional array like:

  sample[I915_PMU_MAX_GTS][__I915_NUM_PMU_SAMPLERS];

Any better? Honestly I don't remember if there was a special reason I 
went for a flat array back then.

> 
>> +
>>   static u64 get_rc6(struct intel_gt *gt)
>>   {
>> 	struct drm_i915_private *i915 = gt->i915;
>> +	const unsigned int gt_id = gt->info.id;
>> 	struct i915_pmu *pmu = &i915->pmu;
>> 	unsigned long flags;
>> 	bool awake = false;
>> @@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
>> 	spin_lock_irqsave(&pmu->lock, flags);
>>
>> 	if (awake) {
>> -		pmu->sample[__I915_SAMPLE_RC6].cur = val;
>> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>> 	} else {
>> 		/*
>> 		 * We think we are runtime suspended.
>> @@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
>> 		 * on top of the last known real value, as the approximated RC6
>> 		 * counter value.
>> 		 */
>> -		val = ktime_since_raw(pmu->sleep_last);
>> -		val += pmu->sample[__I915_SAMPLE_RC6].cur;
>> +		val = ktime_since_raw(pmu->sleep_last[gt_id]);
>> +		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
>> 	}
>>
>> -	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
>> -		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
>> +	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
>> +		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
>> 	else
>> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
>> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
>>
>> 	spin_unlock_irqrestore(&pmu->lock, flags);
>>
>> @@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
>>   static void init_rc6(struct i915_pmu *pmu)
>>   {
>> 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>> -	intel_wakeref_t wakeref;
>> +	struct intel_gt *gt;
>> +	unsigned int i;
>> +
>> +	for_each_gt(gt, i915, i) {
>> +		intel_wakeref_t wakeref;
>>
>> -	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
>> -		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
>> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
>> -					pmu->sample[__I915_SAMPLE_RC6].cur;
>> -		pmu->sleep_last = ktime_get_raw();
>> +		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>> +			u64 val = __get_rc6(gt);
>> +
>> +			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
>> +			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
>> +				     val);
>> +			pmu->sleep_last[i] = ktime_get_raw();
>> +		}
>> 	}
>>   }
>>
>> @@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
>>   {
>> 	struct i915_pmu *pmu = &gt->i915->pmu;
>>
>> -	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
>> -	pmu->sleep_last = ktime_get_raw();
>> +	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
>> +	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>>   }
>>
>>   static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>> @@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
>> 	}
>>   }
>>
>> -static void
>> -add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
>> -{
>> -	sample->cur += mul_u32_u32(val, mul);
>> -}
>> -
>> -static bool frequency_sampling_enabled(struct i915_pmu *pmu)
>> +static bool
>> +frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
>>   {
>> 	return pmu->enable &
>> -	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>> -		config_mask(I915_PMU_REQUESTED_FREQUENCY));
>> +	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
>> +		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
> 
> Here again:
> 
> 	u32 pmu->enable & u64 config_mask
> 
> Probably ok?
> 
> And also in i915_pmu_enable() we have:
> 
> 	pmu->enable |= BIT_ULL(bit);
> 
> So change pmu->enable to u64?
> 
>>   }
>>
>>   static void
>>   frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>   {
>> 	struct drm_i915_private *i915 = gt->i915;
>> +	const unsigned int gt_id = gt->info.id;
>> 	struct i915_pmu *pmu = &i915->pmu;
>> 	struct intel_rps *rps = &gt->rps;
>>
>> -	if (!frequency_sampling_enabled(pmu))
>> +	if (!frequency_sampling_enabled(pmu, gt_id))
>> 		return;
> 
> Pre-existing issue, but why do we need this check? This is already checked
> in the two individual checks for actual and requested freq below:
> 
> 	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id)))
> 
> 	and
> 
> 	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id)))
> 
> So can we delete frequency_sampling_enabled()? Or is it there to avoid the
> overhead of intel_gt_pm_get_if_awake() (which doesn't seem to be much)?

I think it was to avoid even getting an already active pm ref if 
frequency events are not enabled. Timer could be running for instance if 
only engine wait/sema is enabled. So yeah, just a little bit cheaper 
than pm get + async put and avoid prolonging the delayed put for no 
reason. (As the timer races with regular GT pm activities (see 
mod_delayed_work in __intel_wakeref_put_last).)

> 
>>
>> 	/* Report 0/0 (actual/requested) frequency while parked. */
>> 	if (!intel_gt_pm_get_if_awake(gt))
>> 		return;
>>
>> -	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
>> +	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>> 		u32 val;
>>
>> 		/*
>> @@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>> 		if (!val)
>> 			val = intel_gpu_freq(rps, rps->cur_freq);
>>
>> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
>> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
>> 				val, period_ns / 1000);
>> 	}
>>
>> -	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
>> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
>> +	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
>> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>> 				intel_rps_get_requested_frequency(rps),
>> 				period_ns / 1000);
>> 	}
>> @@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>> 			continue;
>>
>> 		engines_sample(gt, period_ns);
>> -
>> -		if (i == 0) /* FIXME */
>> -			frequency_sample(gt, period_ns);
>> +		frequency_sample(gt, period_ns);
>> 	}
>>
>> 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
>> @@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
>>   {
>> 	struct intel_gt *gt = to_gt(i915);
>>
>> -	switch (config) {
>> +	unsigned int gt_id = config_gt_id(config);
>> +
>> +	if (gt_id)
>> +		return -ENOENT;
> 
> This is just wrong. It is fixed in the next patch:
> 
> 	if (gt_id > max_gt_id)
> 		return -ENOENT;
> 
> But probably should be fixed in this patch itself. Or dropped from this
> patch and let it come in in Patch 6, since it's confusing. Though it
> probably belongs in this patch.

Hmm my thinking was probably to reject gt > 0 in this patch since only 
the last patch was supposed to be exposing the other tiles. Granted that 
is not entirely true since this patch already makes access to them 
available via i915_drm.h. Last patch only makes then discoverable via sysfs.

In this case yes, I'd pull in "gt_id > max_gt_id" into this patch. And 
this hunk from the next patch too:

  	case I915_PMU_INTERRUPTS:
+		if (gt_id)
+			return -ENOENT;

> 
>> +
>> +	switch (config_counter(config)) {
>> 	case I915_PMU_ACTUAL_FREQUENCY:
>> 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>> 			/* Requires a mutex for sampling! */
>> @@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>> 			val = engine->pmu.sample[sample].cur;
>> 		}
>> 	} else {
>> -		switch (event->attr.config) {
>> +		const unsigned int gt_id = config_gt_id(event->attr.config);
>> +		const u64 config = config_counter(event->attr.config);
>> +
>> +		switch (config) {
>> 		case I915_PMU_ACTUAL_FREQUENCY:
>> 			val =
>> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
>> +			   div_u64(read_sample(pmu, gt_id,
>> +					       __I915_SAMPLE_FREQ_ACT),
>> 				   USEC_PER_SEC /* to MHz */);
>> 			break;
>> 		case I915_PMU_REQUESTED_FREQUENCY:
>> 			val =
>> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
>> +			   div_u64(read_sample(pmu, gt_id,
>> +					       __I915_SAMPLE_FREQ_REQ),
>> 				   USEC_PER_SEC /* to MHz */);
>> 			break;
>> 		case I915_PMU_INTERRUPTS:
>> 			val = READ_ONCE(pmu->irq_count);
>> 			break;
>> 		case I915_PMU_RC6_RESIDENCY:
>> -			val = get_rc6(to_gt(i915));
>> +			val = get_rc6(i915->gt[gt_id]);
>> 			break;
>> 		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>> 			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>> index 3a811266ac6a..d47846f21ddf 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.h
>> +++ b/drivers/gpu/drm/i915/i915_pmu.h
>> @@ -38,13 +38,16 @@ enum {
>> 	__I915_NUM_PMU_SAMPLERS
>>   };
>>
>> +#define I915_PMU_MAX_GTS (4)
> 
> 4 or (4)? :-)

Bike shed was strong with you on the day of review I see. :)

I would rather get rid of this define altogether if we could use the 
"normal" MAX_GT define. As I was saying earlier, I think this one was 
here just because header dependencies were too convulted back then. 
Maybe today things are better? Worth I try probably.

> 
>> +
>>   /*
>>    * How many different events we track in the global PMU mask.
>>    *
>>    * It is also used to know to needed number of event reference counters.
>>    */
>>   #define I915_PMU_MASK_BITS \
>> -	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
>> +	(I915_ENGINE_SAMPLE_COUNT + \
>> +	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
>>
>>   #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
>>
>> @@ -124,11 +127,11 @@ struct i915_pmu {
>> 	 * Only global counters are held here, while the per-engine ones are in
>> 	 * struct intel_engine_cs.
>> 	 */
>> -	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
>> +	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>> 	/**
>> 	 * @sleep_last: Last time GT parked for RC6 estimation.
>> 	 */
>> -	ktime_t sleep_last;
>> +	ktime_t sleep_last[I915_PMU_MAX_GTS];
>> 	/**
>> 	 * @irq_count: Number of interrupts
>> 	 *
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index dba7c5a5b25e..d5ac1fdeb2b1 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
>>   #define I915_PMU_ENGINE_SEMA(class, instance) \
>> 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>>
>> -#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
>> +/*
>> + * Top 4 bits of every non-engine counter are GT id.
>> + */
>> +#define __I915_PMU_GT_SHIFT (60)
> 
> REG_GENMASK64 or GENMASK_ULL would be nicer but of course we can't put in
> the uapi header, so ok.

Yep.

>> +
>> +#define ___I915_PMU_OTHER(gt, x) \
>> +	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>> +	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>> +
>> +#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>>
>>   #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>>   #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
>> @@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
>>
>>   #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>>
>> +#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
>> +#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
>> +#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
>> +#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
>> +#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
>> +
>>   /* Each region is a minimum of 16k, and there are at most 255 of them.
>>    */
>>   #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
>> --
>> 2.36.1
>>
> 
> Above comments are mostly nits so after addressing the above comments, this
> is:
> 
> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

Well you found some ugly bits (or I got confused, double check me 
please) so I'd say hold off with r-b just yet. Sadly it's on Umesh now 
to fix up my mess. :I

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-11 18:57   ` Dixit, Ashutosh
@ 2023-05-12 10:57     ` Tvrtko Ursulin
  2023-05-12 17:08       ` Dixit, Ashutosh
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-12 10:57 UTC (permalink / raw)
  To: Dixit, Ashutosh, Umesh Nerlige Ramappa; +Cc: intel-gfx


On 11/05/2023 19:57, Dixit, Ashutosh wrote:
> On Fri, 05 May 2023 17:58:16 -0700, Umesh Nerlige Ramappa wrote:
>>
> 
> One drive-by comment:
> 
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index 12b2f3169abf..284e5c5b97bb 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
>> 	struct intel_gt *gt = to_gt(i915);
>>
>> 	unsigned int gt_id = config_gt_id(config);
>> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
> 
> But in Patch 5 we have:
> 
> #define I915_PMU_MAX_GTS (4)

AFAIR that one is just to size the internal arrays, while max_gt_id is 
to report to userspace which events are present.

Regards,

Tvrtko

> 
>>
>> -	if (gt_id)
>> +	if (gt_id > max_gt_id)
>> 		return -ENOENT;
>>
>> 	switch (config_counter(config)) {
>> @@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>> 			return -ENODEV;
>> 		break;
>> 	case I915_PMU_INTERRUPTS:
>> +		if (gt_id)
>> +			return -ENOENT;
>> 		break;
>> 	case I915_PMU_RC6_RESIDENCY:
>> 		if (!gt->rc6.supported)
> 
> Thanks.
> --
> Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-12 10:57     ` Tvrtko Ursulin
@ 2023-05-12 17:08       ` Dixit, Ashutosh
  2023-05-12 18:53         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-12 17:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, 12 May 2023 03:57:35 -0700, Tvrtko Ursulin wrote:
>
>
> On 11/05/2023 19:57, Dixit, Ashutosh wrote:
> > On Fri, 05 May 2023 17:58:16 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > One drive-by comment:
> >
> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> >> index 12b2f3169abf..284e5c5b97bb 100644
> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >> @@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
> >>	struct intel_gt *gt = to_gt(i915);
> >>
> >>	unsigned int gt_id = config_gt_id(config);
> >> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
> >
> > But in Patch 5 we have:
> >
> > #define I915_PMU_MAX_GTS (4)
>
> AFAIR that one is just to size the internal arrays, while max_gt_id is to
> report to userspace which events are present.

Hmm, apart from the #defines's in i915_drm.h in Patch 5, not seeing
anything else reported to userspace about which events are present.

Also, we already have I915_MAX_GT, we shouldn't need I915_PMU_MAX_GTS, or
at least:

	#define I915_PMU_MAX_GTS I915_MAX_GT

Better to use things uniformly. If we want I915_PMU_MAX_GTS to be 2 instead
of I915_MAX_GT (but why?, below is just a check) let's do

	#define I915_PMU_MAX_GTS 2

And use that in the code above. But I think we should just use I915_MAX_GT.

Thanks.
--
Ashutosh


> >
> >>
> >> -	if (gt_id)
> >> +	if (gt_id > max_gt_id)
> >>		return -ENOENT;
> >>
> >>	switch (config_counter(config)) {
> >> @@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
> >>			return -ENODEV;
> >>		break;
> >>	case I915_PMU_INTERRUPTS:
> >> +		if (gt_id)
> >> +			return -ENOENT;
> >>		break;
> >>	case I915_PMU_RC6_RESIDENCY:
> >>		if (!gt->rc6.supported)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-12 17:08       ` Dixit, Ashutosh
@ 2023-05-12 18:53         ` Umesh Nerlige Ramappa
  2023-05-12 20:10           ` Dixit, Ashutosh
  0 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-12 18:53 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, May 12, 2023 at 10:08:58AM -0700, Dixit, Ashutosh wrote:
>On Fri, 12 May 2023 03:57:35 -0700, Tvrtko Ursulin wrote:
>>
>>
>> On 11/05/2023 19:57, Dixit, Ashutosh wrote:
>> > On Fri, 05 May 2023 17:58:16 -0700, Umesh Nerlige Ramappa wrote:
>> >>
>> >
>> > One drive-by comment:
>> >
>> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> >> index 12b2f3169abf..284e5c5b97bb 100644
>> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> >> @@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
>> >>	struct intel_gt *gt = to_gt(i915);
>> >>
>> >>	unsigned int gt_id = config_gt_id(config);
>> >> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>> >
>> > But in Patch 5 we have:
>> >
>> > #define I915_PMU_MAX_GTS (4)
>>
>> AFAIR that one is just to size the internal arrays, while max_gt_id is to
>> report to userspace which events are present.
>
>Hmm, apart from the #defines's in i915_drm.h in Patch 5, not seeing
>anything else reported to userspace about which events are present.

Ex: We have only gt0 and gt1 on MTL. When user configures an event (sets 
event id, tile id etc on the config parameter) and calls the 
perf_event_open, it results in i915_pmu_event_init() -> config_status() 
which will return an ENOENT if the event was for say gt2 or gt3. This is 
for runtime check only.

>
>Also, we already have I915_MAX_GT, we shouldn't need I915_PMU_MAX_GTS, or
>at least:
>
>	#define I915_PMU_MAX_GTS I915_MAX_GT
>
>Better to use things uniformly. If we want I915_PMU_MAX_GTS to be 2 instead
>of I915_MAX_GT (but why?, below is just a check) let's do
>
>	#define I915_PMU_MAX_GTS 2
>
>And use that in the code above. But I think we should just use I915_MAX_GT.

Agree,

Thanks,
Umesh
>
>Thanks.
>--
>Ashutosh
>
>
>> >
>> >>
>> >> -	if (gt_id)
>> >> +	if (gt_id > max_gt_id)
>> >>		return -ENOENT;
>> >>
>> >>	switch (config_counter(config)) {
>> >> @@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>> >>			return -ENODEV;
>> >>		break;
>> >>	case I915_PMU_INTERRUPTS:
>> >> +		if (gt_id)
>> >> +			return -ENOENT;
>> >>		break;
>> >>	case I915_PMU_RC6_RESIDENCY:
>> >>		if (!gt->rc6.supported)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles
  2023-05-12 18:53         ` Umesh Nerlige Ramappa
@ 2023-05-12 20:10           ` Dixit, Ashutosh
  0 siblings, 0 replies; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-12 20:10 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 12 May 2023 11:53:32 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, May 12, 2023 at 10:08:58AM -0700, Dixit, Ashutosh wrote:
> > On Fri, 12 May 2023 03:57:35 -0700, Tvrtko Ursulin wrote:
> >>
> >>
> >> On 11/05/2023 19:57, Dixit, Ashutosh wrote:
> >> > On Fri, 05 May 2023 17:58:16 -0700, Umesh Nerlige Ramappa wrote:
> >> >>
> >> >
> >> > One drive-by comment:
> >> >
> >> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> >> >> index 12b2f3169abf..284e5c5b97bb 100644
> >> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >> >> @@ -546,8 +546,9 @@ config_status(struct drm_i915_private *i915, u64 config)
> >> >>	struct intel_gt *gt = to_gt(i915);
> >> >>
> >> >>	unsigned int gt_id = config_gt_id(config);
> >> >> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
> >> >
> >> > But in Patch 5 we have:
> >> >
> >> > #define I915_PMU_MAX_GTS (4)
> >>
> >> AFAIR that one is just to size the internal arrays, while max_gt_id is to
> >> report to userspace which events are present.
> >
> > Hmm, apart from the #defines's in i915_drm.h in Patch 5, not seeing
> > anything else reported to userspace about which events are present.
>
> Ex: We have only gt0 and gt1 on MTL. When user configures an event (sets
> event id, tile id etc on the config parameter) and calls the
> perf_event_open, it results in i915_pmu_event_init() -> config_status()
> which will return an ENOENT if the event was for say gt2 or gt3. This is
> for runtime check only.

Ah ok, sorry I missed that. In that case what we have above is fine. xe has
a tile_count field but in i915 there's no easy way to find number of gt's,
short of using for_each_gt() and incrementing a count. That seems like an
overkill. So maybe what we have above is fine.

Thanks.
--
Ashutosh


>
> >
> > Also, we already have I915_MAX_GT, we shouldn't need I915_PMU_MAX_GTS, or
> > at least:
> >
> >	#define I915_PMU_MAX_GTS I915_MAX_GT
> >
> > Better to use things uniformly. If we want I915_PMU_MAX_GTS to be 2 instead
> > of I915_MAX_GT (but why?, below is just a check) let's do
> >
> >	#define I915_PMU_MAX_GTS 2
> >
> > And use that in the code above. But I think we should just use I915_MAX_GT.
>
> Agree,
>
> Thanks,
> Umesh
> >
> > Thanks.
> > --
> > Ashutosh
> >
> >
> >> >
> >> >>
> >> >> -	if (gt_id)
> >> >> +	if (gt_id > max_gt_id)
> >> >>		return -ENOENT;
> >> >>
> >> >>	switch (config_counter(config)) {
> >> >> @@ -561,6 +562,8 @@ config_status(struct drm_i915_private *i915, u64 config)
> >> >>			return -ENODEV;
> >> >>		break;
> >> >>	case I915_PMU_INTERRUPTS:
> >> >> +		if (gt_id)
> >> >> +			return -ENOENT;
> >> >>		break;
> >> >>	case I915_PMU_RC6_RESIDENCY:
> >> >>		if (!gt->rc6.supported)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-12 10:56     ` Tvrtko Ursulin
@ 2023-05-12 20:57       ` Umesh Nerlige Ramappa
  2023-05-12 22:37         ` Umesh Nerlige Ramappa
                           ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-12 20:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
>
>On 12/05/2023 02:08, Dixit, Ashutosh wrote:
>>On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>>>
>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>>Reserve some bits in the counter config namespace which will carry the
>>>tile id and prepare the code to handle this.
>>>
>>>No per tile counters have been added yet.
>>>
>>>v2:
>>>- Fix checkpatch issues
>>>- Use 4 bits for gt id in non-engine counters. Drop FIXME.
>>>- Set MAX GTs to 4. Drop FIXME.
>>>
>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>---
>>>  drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
>>>  drivers/gpu/drm/i915/i915_pmu.h |   9 +-
>>>  include/uapi/drm/i915_drm.h     |  17 +++-
>>>  3 files changed, 129 insertions(+), 47 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>>index 669a42e44082..12b2f3169abf 100644
>>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>>@@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
>>>	return config < __I915_PMU_OTHER(0);
>>>  }
>>>
>>>+static unsigned int config_gt_id(const u64 config)
>>>+{
>>>+	return config >> __I915_PMU_GT_SHIFT;
>>>+}
>>>+
>>>+static u64 config_counter(const u64 config)
>>>+{
>>>+	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
>>
>>ok, but another possibility:
>>
>>	return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT);
>
>It's not a register so no. :) GENMASK_ULL maybe but meh.

leaving as is.

>
>>>+}
>>>+
>>>  static unsigned int other_bit(const u64 config)
>>>  {
>>>	unsigned int val;
>>>
>>>-	switch (config) {
>>>+	switch (config_counter(config)) {
>>>	case I915_PMU_ACTUAL_FREQUENCY:
>>>		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
>>>		break;
>>>@@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
>>>		return -1;
>>>	}
>>>
>>>-	return I915_ENGINE_SAMPLE_COUNT + val;
>>>+	return I915_ENGINE_SAMPLE_COUNT +
>>>+	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
>>>+	       val;
>>>  }
>>>
>>>  static unsigned int config_bit(const u64 config)
>>>  {
>>>-	if (is_engine_config(config))
>>>+	if (is_engine_config(config)) {
>>>+		GEM_BUG_ON(config_gt_id(config));
>>
>>This GEM_BUG_ON is not needed since:
>>
>>	static bool is_engine_config(u64 config)
>>	{
>>	        return config < __I915_PMU_OTHER(0);
>>	}
>
>True!

dropping BUG_ON

>
>>>+
>>>		return engine_config_sample(config);
>>>-	else
>>>+	} else {
>>>		return other_bit(config);
>>>+	}
>>>  }
>>>
>>>  static u64 config_mask(u64 config)
>>>@@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
>>>	return config_bit(event->attr.config);
>>>  }
>>>
>>>+static u64 frequency_enabled_mask(void)
>>>+{
>>>+	unsigned int i;
>>>+	u64 mask = 0;
>>>+
>>>+	for (i = 0; i < I915_PMU_MAX_GTS; i++)
>>>+		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>>>+			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>>>+
>>>+	return mask;
>>>+}
>>>+
>>>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>  {
>>>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>>>@@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>	 * Mask out all the ones which do not need the timer, or in
>>>	 * other words keep all the ones that could need the timer.
>>>	 */
>>>-	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>-		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>>>-		  ENGINE_SAMPLE_MASK;
>>>+	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>>
>>u32 enable & u64 frequency_enabled_mask
>>
>>ugly but ok I guess? Or change enable to u64?

making pmu->enable u64 as well as other places where it is assigned to 
local variables.

>
>Hmm.. yes very ugly. Could have been an accident which happened to 
>work because there is a single timer (not per tile).

Happened to work because the frequency mask does not spill over to the 
upper 32 bits (even for multi tile).

--------------------- START_SECTION ----------------
>
>Similar issue in frequency_sampling_enabled too. Gt_id argument to it 
>seems pointless.

Not sure why it's pointless. We need the gt_id to determine the right 
mask for that specific gt. If it's not enabled, then we just return 
without pm_get and async put (like you mention later). 

And this piece of code is called within for_each_gt.

>
>So I now think whole frequency_enabled_mask() is just pointless and 
>should be removed. And then pmu_needs_time code can stay as is. 
>Possibly add a config_mask_32 helper which ensures no bits in upper 32 
>bits are returned.
>
>That is if we are happy for the frequency_sampling_enabled returning 
>true for all gts, regardless of which ones actually have frequency 
>sampling enabled.
>
>Or if we want to implement it as I probably have intended, we will 
>need to add some gt bits into pmu->enable. Maybe reserve top four same 
>as with config counters.

Nope. What you have here works just fine. pmu->enable should not include 
any gt id info. gt_id[63:60] is only a concept for pmu config sent by 
user.  config_mask and pmu->enable are i915 internal bookkeeping (bit 
masks) just to track what events need to be sampled.  The 'other' bit 
masks are a function of gt_id because we use gt_id to calculate a 
contiguous numerical value for these 'other' events. That's all. Once 
the numerical value is calculated, there is no need for gt_id because 
config_mask is BIT_ULL(numerical_value). Since the numerical values 
never exceeded 31 (even for multi-gts), everything worked even with 32 
bit pmu->enable.

>
>In this case the config_mask needs to be updated to translate not just 
>the config counter into the pmu tracked event bits, but config counter 
>gt id into the pmu->enabled gt id.
>
>Sounds easily doable on a first thought.

------------------------ END_SECTION ----------------


>>>
>>>	/*
>>>	 * When the GPU is idle per-engine counters do not need to be
>>>@@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
>>>	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
>>>  }
>>>
>>>+static unsigned int
>>>+__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>>>+{
>>>+	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
>>>+
>>>+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
>>
>>Does this GEM_BUG_ON need to be split up as follows:
>>
>>	GEM_BUG_ON(gt_id >= I915_PMU_MAX_GTS);
>>	GEM_BUG_ON(sample >= __I915_NUM_PMU_SAMPLERS);
>>
>>Since that is what we really mean here isn't it?
>
>ARRAY_SIZE check seems the safest option to me, given it is defined as:
>
> sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>
>What problem do you see here?
>
>>>+
>>>+	return idx;
>>>+}
>>>+
>>>+static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>>>+{
>>>+	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
>>>+}
>>>+
>>>+static void
>>>+store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
>>>+{
>>>+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
>>>+}
>>>+
>>>+static void
>>>+add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
>>>+{
>>>+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
>>>+}
>>
>>Gripe: I think this code should have per event data structures which store
>>all information about a particular event. Rather than storing it in these
>>arrays common to all events (and in bit-fields common to all events) which
>>results in the kind of dance we have to do here. Anyway too big a change to
>>make now but something to consider if we ever do this for xe.
>
>Could do a two dimensional array like:
>
> sample[I915_PMU_MAX_GTS][__I915_NUM_PMU_SAMPLERS];
>
>Any better? Honestly I don't remember if there was a special reason I 
>went for a flat array back then.

Maybe we improve it in XE. I am looking for the shortest path to get 
this merged without any functional issues.

>
>>
>>>+
>>>  static u64 get_rc6(struct intel_gt *gt)
>>>  {
>>>	struct drm_i915_private *i915 = gt->i915;
>>>+	const unsigned int gt_id = gt->info.id;
>>>	struct i915_pmu *pmu = &i915->pmu;
>>>	unsigned long flags;
>>>	bool awake = false;
>>>@@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
>>>	spin_lock_irqsave(&pmu->lock, flags);
>>>
>>>	if (awake) {
>>>-		pmu->sample[__I915_SAMPLE_RC6].cur = val;
>>>+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>>>	} else {
>>>		/*
>>>		 * We think we are runtime suspended.
>>>@@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
>>>		 * on top of the last known real value, as the approximated RC6
>>>		 * counter value.
>>>		 */
>>>-		val = ktime_since_raw(pmu->sleep_last);
>>>-		val += pmu->sample[__I915_SAMPLE_RC6].cur;
>>>+		val = ktime_since_raw(pmu->sleep_last[gt_id]);
>>>+		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
>>>	}
>>>
>>>-	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
>>>-		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
>>>+	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
>>>+		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
>>>	else
>>>-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
>>>+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
>>>
>>>	spin_unlock_irqrestore(&pmu->lock, flags);
>>>
>>>@@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
>>>  static void init_rc6(struct i915_pmu *pmu)
>>>  {
>>>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>>>-	intel_wakeref_t wakeref;
>>>+	struct intel_gt *gt;
>>>+	unsigned int i;
>>>+
>>>+	for_each_gt(gt, i915, i) {
>>>+		intel_wakeref_t wakeref;
>>>
>>>-	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
>>>-		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
>>>-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
>>>-					pmu->sample[__I915_SAMPLE_RC6].cur;
>>>-		pmu->sleep_last = ktime_get_raw();
>>>+		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>>>+			u64 val = __get_rc6(gt);
>>>+
>>>+			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
>>>+			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
>>>+				     val);
>>>+			pmu->sleep_last[i] = ktime_get_raw();
>>>+		}
>>>	}
>>>  }
>>>
>>>@@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
>>>  {
>>>	struct i915_pmu *pmu = &gt->i915->pmu;
>>>
>>>-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
>>>-	pmu->sleep_last = ktime_get_raw();
>>>+	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
>>>+	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>>>  }
>>>
>>>  static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>>>@@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
>>>	}
>>>  }
>>>
>>>-static void
>>>-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
>>>-{
>>>-	sample->cur += mul_u32_u32(val, mul);
>>>-}
>>>-
>>>-static bool frequency_sampling_enabled(struct i915_pmu *pmu)
>>>+static bool
>>>+frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
>>>  {
>>>	return pmu->enable &
>>>-	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>-		config_mask(I915_PMU_REQUESTED_FREQUENCY));
>>>+	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
>>>+		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
>>
>>Here again:
>>
>>	u32 pmu->enable & u64 config_mask
>>
>>Probably ok?
>>
>>And also in i915_pmu_enable() we have:
>>
>>	pmu->enable |= BIT_ULL(bit);
>>
>>So change pmu->enable to u64?

Right, changing to u64

>>
>>>  }
>>>
>>>  static void
>>>  frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>>  {
>>>	struct drm_i915_private *i915 = gt->i915;
>>>+	const unsigned int gt_id = gt->info.id;
>>>	struct i915_pmu *pmu = &i915->pmu;
>>>	struct intel_rps *rps = &gt->rps;
>>>
>>>-	if (!frequency_sampling_enabled(pmu))
>>>+	if (!frequency_sampling_enabled(pmu, gt_id))
>>>		return;
>>
>>Pre-existing issue, but why do we need this check? This is already checked
>>in the two individual checks for actual and requested freq below:
>>
>>	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id)))
>>
>>	and
>>
>>	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id)))
>>
>>So can we delete frequency_sampling_enabled()? Or is it there to avoid the
>>overhead of intel_gt_pm_get_if_awake() (which doesn't seem to be much)?
>
>I think it was to avoid even getting an already active pm ref if 
>frequency events are not enabled. Timer could be running for instance 
>if only engine wait/sema is enabled. So yeah, just a little bit 
>cheaper than pm get + async put and avoid prolonging the delayed put 
>for no reason. (As the timer races with regular GT pm activities (see 
>mod_delayed_work in __intel_wakeref_put_last).)

leaving as is.

>
>>
>>>
>>>	/* Report 0/0 (actual/requested) frequency while parked. */
>>>	if (!intel_gt_pm_get_if_awake(gt))
>>>		return;
>>>
>>>-	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
>>>+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>>>		u32 val;
>>>
>>>		/*
>>>@@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>>		if (!val)
>>>			val = intel_gpu_freq(rps, rps->cur_freq);
>>>
>>>-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
>>>+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
>>>				val, period_ns / 1000);
>>>	}
>>>
>>>-	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
>>>-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
>>>+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
>>>+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>>>				intel_rps_get_requested_frequency(rps),
>>>				period_ns / 1000);
>>>	}
>>>@@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>>>			continue;
>>>
>>>		engines_sample(gt, period_ns);
>>>-
>>>-		if (i == 0) /* FIXME */
>>>-			frequency_sample(gt, period_ns);
>>>+		frequency_sample(gt, period_ns);
>>>	}
>>>
>>>	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
>>>@@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
>>>  {
>>>	struct intel_gt *gt = to_gt(i915);
>>>
>>>-	switch (config) {
>>>+	unsigned int gt_id = config_gt_id(config);
>>>+
>>>+	if (gt_id)
>>>+		return -ENOENT;
>>
>>This is just wrong. It is fixed in the next patch:
>>
>>	if (gt_id > max_gt_id)
>>		return -ENOENT;
>>
>>But probably should be fixed in this patch itself. Or dropped from this
>>patch and let it come in in Patch 6, since it's confusing. Though it
>>probably belongs in this patch.
>
>Hmm my thinking was probably to reject gt > 0 in this patch since only 
>the last patch was supposed to be exposing the other tiles. Granted 
>that is not entirely true since this patch already makes access to 
>them available via i915_drm.h. Last patch only makes then discoverable 
>via sysfs.
>
>In this case yes, I'd pull in "gt_id > max_gt_id" into this patch. And 
>this hunk from the next patch too:
>
> 	case I915_PMU_INTERRUPTS:
>+		if (gt_id)
>+			return -ENOENT;
>

pulling in the above snippets from patch 6 to patch 5

>>
>>>+
>>>+	switch (config_counter(config)) {
>>>	case I915_PMU_ACTUAL_FREQUENCY:
>>>		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>>>			/* Requires a mutex for sampling! */
>>>@@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>>>			val = engine->pmu.sample[sample].cur;
>>>		}
>>>	} else {
>>>-		switch (event->attr.config) {
>>>+		const unsigned int gt_id = config_gt_id(event->attr.config);
>>>+		const u64 config = config_counter(event->attr.config);
>>>+
>>>+		switch (config) {
>>>		case I915_PMU_ACTUAL_FREQUENCY:
>>>			val =
>>>-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
>>>+			   div_u64(read_sample(pmu, gt_id,
>>>+					       __I915_SAMPLE_FREQ_ACT),
>>>				   USEC_PER_SEC /* to MHz */);
>>>			break;
>>>		case I915_PMU_REQUESTED_FREQUENCY:
>>>			val =
>>>-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
>>>+			   div_u64(read_sample(pmu, gt_id,
>>>+					       __I915_SAMPLE_FREQ_REQ),
>>>				   USEC_PER_SEC /* to MHz */);
>>>			break;
>>>		case I915_PMU_INTERRUPTS:
>>>			val = READ_ONCE(pmu->irq_count);
>>>			break;
>>>		case I915_PMU_RC6_RESIDENCY:
>>>-			val = get_rc6(to_gt(i915));
>>>+			val = get_rc6(i915->gt[gt_id]);
>>>			break;
>>>		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>>>			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>>>index 3a811266ac6a..d47846f21ddf 100644
>>>--- a/drivers/gpu/drm/i915/i915_pmu.h
>>>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>>>@@ -38,13 +38,16 @@ enum {
>>>	__I915_NUM_PMU_SAMPLERS
>>>  };
>>>
>>>+#define I915_PMU_MAX_GTS (4)
>>
>>4 or (4)? :-)
>
>Bike shed was strong with you on the day of review I see. :)
>
>I would rather get rid of this define altogether if we could use the 
>"normal" MAX_GT define. As I was saying earlier, I think this one was 
>here just because header dependencies were too convulted back then. 
>Maybe today things are better? Worth I try probably.

dropping this and using I915_MAX_GTS

>
>>
>>>+
>>>  /*
>>>   * How many different events we track in the global PMU mask.
>>>   *
>>>   * It is also used to know to needed number of event reference counters.
>>>   */
>>>  #define I915_PMU_MASK_BITS \
>>>-	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
>>>+	(I915_ENGINE_SAMPLE_COUNT + \
>>>+	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
>>>
>>>  #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
>>>
>>>@@ -124,11 +127,11 @@ struct i915_pmu {
>>>	 * Only global counters are held here, while the per-engine ones are in
>>>	 * struct intel_engine_cs.
>>>	 */
>>>-	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
>>>+	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>>>	/**
>>>	 * @sleep_last: Last time GT parked for RC6 estimation.
>>>	 */
>>>-	ktime_t sleep_last;
>>>+	ktime_t sleep_last[I915_PMU_MAX_GTS];
>>>	/**
>>>	 * @irq_count: Number of interrupts
>>>	 *
>>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>index dba7c5a5b25e..d5ac1fdeb2b1 100644
>>>--- a/include/uapi/drm/i915_drm.h
>>>+++ b/include/uapi/drm/i915_drm.h
>>>@@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
>>>  #define I915_PMU_ENGINE_SEMA(class, instance) \
>>>	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>>>
>>>-#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
>>>+/*
>>>+ * Top 4 bits of every non-engine counter are GT id.
>>>+ */
>>>+#define __I915_PMU_GT_SHIFT (60)
>>
>>REG_GENMASK64 or GENMASK_ULL would be nicer but of course we can't put in
>>the uapi header, so ok.
>
>Yep.

leaving as is.

>
>>>+
>>>+#define ___I915_PMU_OTHER(gt, x) \
>>>+	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>>>+	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>>>+
>>>+#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>>>
>>>  #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>>>  #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
>>>@@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
>>>
>>>  #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>>>
>>>+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
>>>+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
>>>+#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
>>>+#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
>>>+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
>>>+
>>>  /* Each region is a minimum of 16k, and there are at most 255 of them.
>>>   */
>>>  #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
>>>--
>>>2.36.1
>>>
>>
>>Above comments are mostly nits so after addressing the above comments, this
>>is:
>>
>>Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
>Well you found some ugly bits (or I got confused, double check me 
>please) so I'd say hold off with r-b just yet. Sadly it's on Umesh now 
>to fix up my mess. :I

I don't see anything wrong with the SECTION I marked above. As in, the 
pmu_needs_timer and the sampling code for events that need to be 
sampled. If you agree, I can spin the next revision.

Thanks,
Umesh

>
>Regards,
>
>Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-06  0:58 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
  2023-05-08 17:58   ` Umesh Nerlige Ramappa
  2023-05-09 17:25   ` Dixit, Ashutosh
@ 2023-05-12 22:29   ` Dixit, Ashutosh
  2023-05-12 22:44     ` Umesh Nerlige Ramappa
  2 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-12 22:29 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh/Tvrtko,

> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> We do not want to have timers per tile and waste CPU cycles and energy via
> multiple wake-up sources, for a relatively un-important task of PMU
> sampling, so keeping a single timer works well. But we also do not want
> the first GT which goes idle to turn off the timer.
>
> Add some reference counting, via a mask of unparked GTs, to solve this.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
>  drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 2b63ee31e1b3..669a42e44082 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
>	 * Signal sampling timer to stop if only engine events are enabled and
>	 * GPU went idle.
>	 */
> -	pmu->timer_enabled = pmu_needs_timer(pmu, false);
> +	pmu->unparked &= ~BIT(gt->info.id);
> +	if (pmu->unparked == 0)
> +		pmu->timer_enabled = pmu_needs_timer(pmu, false);
>
>	spin_unlock_irq(&pmu->lock);
>  }
> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
>	/*
>	 * Re-enable sampling timer when GPU goes active.
>	 */
> -	__i915_pmu_maybe_start_timer(pmu);
> +	if (pmu->unparked == 0)
> +		__i915_pmu_maybe_start_timer(pmu);
> +
> +	pmu->unparked |= BIT(gt->info.id);
>
>	spin_unlock_irq(&pmu->lock);
>  }
> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>	 */
>
>	for_each_gt(gt, i915, i) {
> +		if (!(pmu->unparked & BIT(i)))
> +			continue;
> +

This is not correct. In this series we are at least sampling frequencies
(calling frequency_sample) even when GT is parked. So these 3 lines should be
deleted. engines_sample will get called and will return without doing
anything if engine events are disabled.

Thanks.
--
Ashutosh


>		engines_sample(gt, period_ns);
>
>		if (i == 0) /* FIXME */
> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> index a686fd7ccedf..3a811266ac6a 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.h
> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> @@ -76,6 +76,10 @@ struct i915_pmu {
>	 * @lock: Lock protecting enable mask and ref count handling.
>	 */
>	spinlock_t lock;
> +	/**
> +	 * @unparked: GT unparked mask.
> +	 */
> +	unsigned int unparked;
>	/**
>	 * @timer: Timer for internal i915 PMU sampling.
>	 */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-12 20:57       ` Umesh Nerlige Ramappa
@ 2023-05-12 22:37         ` Umesh Nerlige Ramappa
  2023-05-13  1:09         ` Dixit, Ashutosh
  2023-05-15 10:10         ` Tvrtko Ursulin
  2 siblings, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-12 22:37 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 12, 2023 at 01:57:59PM -0700, Umesh Nerlige Ramappa wrote:
>On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
>>
>>On 12/05/2023 02:08, Dixit, Ashutosh wrote:
>>>On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>>>>
>>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>>Reserve some bits in the counter config namespace which will carry the
>>>>tile id and prepare the code to handle this.
>>>>
>>>>No per tile counters have been added yet.
>>>>
>>>>v2:
>>>>- Fix checkpatch issues
>>>>- Use 4 bits for gt id in non-engine counters. Drop FIXME.
>>>>- Set MAX GTs to 4. Drop FIXME.
>>>>
>>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>>---
>>>> drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
>>>> drivers/gpu/drm/i915/i915_pmu.h |   9 +-
>>>> include/uapi/drm/i915_drm.h     |  17 +++-
>>>> 3 files changed, 129 insertions(+), 47 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>>>index 669a42e44082..12b2f3169abf 100644
>>>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>>>@@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
>>>>	return config < __I915_PMU_OTHER(0);
>>>> }
>>>>
>>>>+static unsigned int config_gt_id(const u64 config)
>>>>+{
>>>>+	return config >> __I915_PMU_GT_SHIFT;
>>>>+}
>>>>+
>>>>+static u64 config_counter(const u64 config)
>>>>+{
>>>>+	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
>>>
>>>ok, but another possibility:
>>>
>>>	return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT);
>>
>>It's not a register so no. :) GENMASK_ULL maybe but meh.
>
>leaving as is.
>
>>
>>>>+}
>>>>+
>>>> static unsigned int other_bit(const u64 config)
>>>> {
>>>>	unsigned int val;
>>>>
>>>>-	switch (config) {
>>>>+	switch (config_counter(config)) {
>>>>	case I915_PMU_ACTUAL_FREQUENCY:
>>>>		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
>>>>		break;
>>>>@@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
>>>>		return -1;
>>>>	}
>>>>
>>>>-	return I915_ENGINE_SAMPLE_COUNT + val;
>>>>+	return I915_ENGINE_SAMPLE_COUNT +
>>>>+	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
>>>>+	       val;
>>>> }
>>>>
>>>> static unsigned int config_bit(const u64 config)
>>>> {
>>>>-	if (is_engine_config(config))
>>>>+	if (is_engine_config(config)) {
>>>>+		GEM_BUG_ON(config_gt_id(config));
>>>
>>>This GEM_BUG_ON is not needed since:
>>>
>>>	static bool is_engine_config(u64 config)
>>>	{
>>>	        return config < __I915_PMU_OTHER(0);
>>>	}
>>
>>True!
>
>dropping BUG_ON
>
>>
>>>>+
>>>>		return engine_config_sample(config);
>>>>-	else
>>>>+	} else {
>>>>		return other_bit(config);
>>>>+	}
>>>> }
>>>>
>>>> static u64 config_mask(u64 config)
>>>>@@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
>>>>	return config_bit(event->attr.config);
>>>> }
>>>>
>>>>+static u64 frequency_enabled_mask(void)
>>>>+{
>>>>+	unsigned int i;
>>>>+	u64 mask = 0;
>>>>+
>>>>+	for (i = 0; i < I915_PMU_MAX_GTS; i++)
>>>>+		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>>>>+			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>>>>+
>>>>+	return mask;
>>>>+}
>>>>+
>>>> static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>> {
>>>>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>>>>@@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>>	 * Mask out all the ones which do not need the timer, or in
>>>>	 * other words keep all the ones that could need the timer.
>>>>	 */
>>>>-	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>>-		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>>>>-		  ENGINE_SAMPLE_MASK;
>>>>+	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>>>
>>>u32 enable & u64 frequency_enabled_mask
>>>
>>>ugly but ok I guess? Or change enable to u64?
>
>making pmu->enable u64 as well as other places where it is assigned to 
>local variables.
>
>>
>>Hmm.. yes very ugly. Could have been an accident which happened to 
>>work because there is a single timer (not per tile).
>
>Happened to work because the frequency mask does not spill over to the 
>upper 32 bits (even for multi tile).
>
>--------------------- START_SECTION ----------------
>>
>>Similar issue in frequency_sampling_enabled too. Gt_id argument to 
>>it seems pointless.
>
>Not sure why it's pointless. We need the gt_id to determine the right 
>mask for that specific gt. If it's not enabled, then we just return 
>without pm_get and async put (like you mention later).
>
>And this piece of code is called within for_each_gt.
>
>>
>>So I now think whole frequency_enabled_mask() is just pointless and 
>>should be removed. And then pmu_needs_time code can stay as is. 
>>Possibly add a config_mask_32 helper which ensures no bits in upper 
>>32 bits are returned.
>>
>>That is if we are happy for the frequency_sampling_enabled returning 
>>true for all gts, regardless of which ones actually have frequency 
>>sampling enabled.
>>
>>Or if we want to implement it as I probably have intended, we will 
>>need to add some gt bits into pmu->enable. Maybe reserve top four 
>>same as with config counters.
>
>Nope. What you have here works just fine. pmu->enable should not 
>include any gt id info. gt_id[63:60] is only a concept for pmu config 
>sent by user.  config_mask and pmu->enable are i915 internal 
>bookkeeping (bit masks) just to track what events need to be sampled.  
>The 'other' bit masks are a function of gt_id because we use gt_id to 
>calculate a contiguous numerical value for these 'other' events. 
>That's all. Once the numerical value is calculated, there is no need 
>for gt_id because config_mask is BIT_ULL(numerical_value). Since the 
>numerical values never exceeded 31 (even for multi-gts), everything 
>worked even with 32 bit pmu->enable.
>
>>
>>In this case the config_mask needs to be updated to translate not 
>>just the config counter into the pmu tracked event bits, but config 
>>counter gt id into the pmu->enabled gt id.
>>
>>Sounds easily doable on a first thought.
>
>------------------------ END_SECTION ----------------
>
>
>>>>
>>>>	/*
>>>>	 * When the GPU is idle per-engine counters do not need to be
>>>>@@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
>>>>	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
>>>> }
>>>>
>>>>+static unsigned int
>>>>+__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>>>>+{
>>>>+	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
>>>>+
>>>>+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
>>>
>>>Does this GEM_BUG_ON need to be split up as follows:
>>>
>>>	GEM_BUG_ON(gt_id >= I915_PMU_MAX_GTS);
>>>	GEM_BUG_ON(sample >= __I915_NUM_PMU_SAMPLERS);
>>>
>>>Since that is what we really mean here isn't it?
>>
>>ARRAY_SIZE check seems the safest option to me, given it is defined as:
>>
>>sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>>
>>What problem do you see here?
>>
>>>>+
>>>>+	return idx;
>>>>+}
>>>>+
>>>>+static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>>>>+{
>>>>+	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
>>>>+}
>>>>+
>>>>+static void
>>>>+store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
>>>>+{
>>>>+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
>>>>+}
>>>>+
>>>>+static void
>>>>+add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
>>>>+{
>>>>+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
>>>>+}
>>>
>>>Gripe: I think this code should have per event data structures which store
>>>all information about a particular event. Rather than storing it in these
>>>arrays common to all events (and in bit-fields common to all events) which
>>>results in the kind of dance we have to do here. Anyway too big a change to
>>>make now but something to consider if we ever do this for xe.
>>
>>Could do a two dimensional array like:
>>
>>sample[I915_PMU_MAX_GTS][__I915_NUM_PMU_SAMPLERS];
>>
>>Any better? Honestly I don't remember if there was a special reason 
>>I went for a flat array back then.
>
>Maybe we improve it in XE. I am looking for the shortest path to get 
>this merged without any functional issues.
>
>>
>>>
>>>>+
>>>> static u64 get_rc6(struct intel_gt *gt)
>>>> {
>>>>	struct drm_i915_private *i915 = gt->i915;
>>>>+	const unsigned int gt_id = gt->info.id;
>>>>	struct i915_pmu *pmu = &i915->pmu;
>>>>	unsigned long flags;
>>>>	bool awake = false;
>>>>@@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
>>>>	spin_lock_irqsave(&pmu->lock, flags);
>>>>
>>>>	if (awake) {
>>>>-		pmu->sample[__I915_SAMPLE_RC6].cur = val;
>>>>+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>>>>	} else {
>>>>		/*
>>>>		 * We think we are runtime suspended.
>>>>@@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
>>>>		 * on top of the last known real value, as the approximated RC6
>>>>		 * counter value.
>>>>		 */
>>>>-		val = ktime_since_raw(pmu->sleep_last);
>>>>-		val += pmu->sample[__I915_SAMPLE_RC6].cur;
>>>>+		val = ktime_since_raw(pmu->sleep_last[gt_id]);
>>>>+		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
>>>>	}
>>>>
>>>>-	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
>>>>-		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
>>>>+	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
>>>>+		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
>>>>	else
>>>>-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
>>>>+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
>>>>
>>>>	spin_unlock_irqrestore(&pmu->lock, flags);
>>>>
>>>>@@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
>>>> static void init_rc6(struct i915_pmu *pmu)
>>>> {
>>>>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>>>>-	intel_wakeref_t wakeref;
>>>>+	struct intel_gt *gt;
>>>>+	unsigned int i;
>>>>+
>>>>+	for_each_gt(gt, i915, i) {
>>>>+		intel_wakeref_t wakeref;
>>>>
>>>>-	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
>>>>-		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
>>>>-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
>>>>-					pmu->sample[__I915_SAMPLE_RC6].cur;
>>>>-		pmu->sleep_last = ktime_get_raw();
>>>>+		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>>>>+			u64 val = __get_rc6(gt);
>>>>+
>>>>+			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
>>>>+			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
>>>>+				     val);
>>>>+			pmu->sleep_last[i] = ktime_get_raw();
>>>>+		}
>>>>	}
>>>> }
>>>>
>>>>@@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
>>>> {
>>>>	struct i915_pmu *pmu = &gt->i915->pmu;
>>>>
>>>>-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
>>>>-	pmu->sleep_last = ktime_get_raw();
>>>>+	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
>>>>+	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>>>> }
>>>>
>>>> static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>>>>@@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
>>>>	}
>>>> }
>>>>
>>>>-static void
>>>>-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
>>>>-{
>>>>-	sample->cur += mul_u32_u32(val, mul);
>>>>-}
>>>>-
>>>>-static bool frequency_sampling_enabled(struct i915_pmu *pmu)
>>>>+static bool
>>>>+frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
>>>> {
>>>>	return pmu->enable &
>>>>-	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>>-		config_mask(I915_PMU_REQUESTED_FREQUENCY));
>>>>+	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
>>>>+		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
>>>
>>>Here again:
>>>
>>>	u32 pmu->enable & u64 config_mask
>>>
>>>Probably ok?
>>>
>>>And also in i915_pmu_enable() we have:
>>>
>>>	pmu->enable |= BIT_ULL(bit);
>>>
>>>So change pmu->enable to u64?
>
>Right, changing to u64
>
>>>
>>>> }
>>>>
>>>> static void
>>>> frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>>> {
>>>>	struct drm_i915_private *i915 = gt->i915;
>>>>+	const unsigned int gt_id = gt->info.id;
>>>>	struct i915_pmu *pmu = &i915->pmu;
>>>>	struct intel_rps *rps = &gt->rps;
>>>>
>>>>-	if (!frequency_sampling_enabled(pmu))
>>>>+	if (!frequency_sampling_enabled(pmu, gt_id))
>>>>		return;
>>>
>>>Pre-existing issue, but why do we need this check? This is already checked
>>>in the two individual checks for actual and requested freq below:
>>>
>>>	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id)))
>>>
>>>	and
>>>
>>>	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id)))
>>>
>>>So can we delete frequency_sampling_enabled()? Or is it there to avoid the
>>>overhead of intel_gt_pm_get_if_awake() (which doesn't seem to be much)?
>>
>>I think it was to avoid even getting an already active pm ref if 
>>frequency events are not enabled. Timer could be running for 
>>instance if only engine wait/sema is enabled. So yeah, just a little 
>>bit cheaper than pm get + async put and avoid prolonging the delayed 
>>put for no reason. (As the timer races with regular GT pm activities 
>>(see mod_delayed_work in __intel_wakeref_put_last).)
>
>leaving as is.
>
>>
>>>
>>>>
>>>>	/* Report 0/0 (actual/requested) frequency while parked. */
>>>>	if (!intel_gt_pm_get_if_awake(gt))
>>>>		return;
>>>>
>>>>-	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
>>>>+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>>>>		u32 val;
>>>>
>>>>		/*
>>>>@@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>>>		if (!val)
>>>>			val = intel_gpu_freq(rps, rps->cur_freq);
>>>>
>>>>-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
>>>>+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
>>>>				val, period_ns / 1000);
>>>>	}
>>>>
>>>>-	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
>>>>-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
>>>>+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
>>>>+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>>>>				intel_rps_get_requested_frequency(rps),
>>>>				period_ns / 1000);
>>>>	}
>>>>@@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>>>>			continue;
>>>>
>>>>		engines_sample(gt, period_ns);
>>>>-
>>>>-		if (i == 0) /* FIXME */
>>>>-			frequency_sample(gt, period_ns);
>>>>+		frequency_sample(gt, period_ns);
>>>>	}
>>>>
>>>>	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
>>>>@@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
>>>> {
>>>>	struct intel_gt *gt = to_gt(i915);
>>>>
>>>>-	switch (config) {
>>>>+	unsigned int gt_id = config_gt_id(config);
>>>>+
>>>>+	if (gt_id)
>>>>+		return -ENOENT;
>>>
>>>This is just wrong. It is fixed in the next patch:
>>>
>>>	if (gt_id > max_gt_id)
>>>		return -ENOENT;
>>>
>>>But probably should be fixed in this patch itself. Or dropped from this
>>>patch and let it come in in Patch 6, since it's confusing. Though it
>>>probably belongs in this patch.
>>
>>Hmm my thinking was probably to reject gt > 0 in this patch since 
>>only the last patch was supposed to be exposing the other tiles. 
>>Granted that is not entirely true since this patch already makes 
>>access to them available via i915_drm.h. Last patch only makes then 
>>discoverable via sysfs.
>>
>>In this case yes, I'd pull in "gt_id > max_gt_id" into this patch. 
>>And this hunk from the next patch too:
>>
>>	case I915_PMU_INTERRUPTS:
>>+		if (gt_id)
>>+			return -ENOENT;
>>
>
>pulling in the above snippets from patch 6 to patch 5
>
>>>
>>>>+
>>>>+	switch (config_counter(config)) {
>>>>	case I915_PMU_ACTUAL_FREQUENCY:
>>>>		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>>>>			/* Requires a mutex for sampling! */
>>>>@@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>>>>			val = engine->pmu.sample[sample].cur;
>>>>		}
>>>>	} else {
>>>>-		switch (event->attr.config) {
>>>>+		const unsigned int gt_id = config_gt_id(event->attr.config);
>>>>+		const u64 config = config_counter(event->attr.config);
>>>>+
>>>>+		switch (config) {
>>>>		case I915_PMU_ACTUAL_FREQUENCY:
>>>>			val =
>>>>-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
>>>>+			   div_u64(read_sample(pmu, gt_id,
>>>>+					       __I915_SAMPLE_FREQ_ACT),
>>>>				   USEC_PER_SEC /* to MHz */);
>>>>			break;
>>>>		case I915_PMU_REQUESTED_FREQUENCY:
>>>>			val =
>>>>-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
>>>>+			   div_u64(read_sample(pmu, gt_id,
>>>>+					       __I915_SAMPLE_FREQ_REQ),
>>>>				   USEC_PER_SEC /* to MHz */);
>>>>			break;
>>>>		case I915_PMU_INTERRUPTS:
>>>>			val = READ_ONCE(pmu->irq_count);
>>>>			break;
>>>>		case I915_PMU_RC6_RESIDENCY:
>>>>-			val = get_rc6(to_gt(i915));
>>>>+			val = get_rc6(i915->gt[gt_id]);
>>>>			break;
>>>>		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>>>>			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
>>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>>>>index 3a811266ac6a..d47846f21ddf 100644
>>>>--- a/drivers/gpu/drm/i915/i915_pmu.h
>>>>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>>>>@@ -38,13 +38,16 @@ enum {
>>>>	__I915_NUM_PMU_SAMPLERS
>>>> };
>>>>
>>>>+#define I915_PMU_MAX_GTS (4)
>>>
>>>4 or (4)? :-)
>>
>>Bike shed was strong with you on the day of review I see. :)
>>
>>I would rather get rid of this define altogether if we could use the 
>>"normal" MAX_GT define. As I was saying earlier, I think this one 
>>was here just because header dependencies were too convulted back 
>>then. Maybe today things are better? Worth I try probably.
>
>dropping this and using I915_MAX_GTS

Okay, I see what you mentioned here about the header dependencies. It's 
still the same. i915_drv.h includes intel_engine.h which includes 
i915_pmu.h. I cannot use i915_drv.h to use I915_MAX_GTS, it's causing 
all sorts of compile errors (likely cyclic). I don't see a quick way to 
resolve that, so I am going to leave it as I915_PMU_MAX_GTS.

Thanks,
Umesh

>
>>
>>>
>>>>+
>>>> /*
>>>>  * How many different events we track in the global PMU mask.
>>>>  *
>>>>  * It is also used to know to needed number of event reference counters.
>>>>  */
>>>> #define I915_PMU_MASK_BITS \
>>>>-	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
>>>>+	(I915_ENGINE_SAMPLE_COUNT + \
>>>>+	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
>>>>
>>>> #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
>>>>
>>>>@@ -124,11 +127,11 @@ struct i915_pmu {
>>>>	 * Only global counters are held here, while the per-engine ones are in
>>>>	 * struct intel_engine_cs.
>>>>	 */
>>>>-	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
>>>>+	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>>>>	/**
>>>>	 * @sleep_last: Last time GT parked for RC6 estimation.
>>>>	 */
>>>>-	ktime_t sleep_last;
>>>>+	ktime_t sleep_last[I915_PMU_MAX_GTS];
>>>>	/**
>>>>	 * @irq_count: Number of interrupts
>>>>	 *
>>>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>>index dba7c5a5b25e..d5ac1fdeb2b1 100644
>>>>--- a/include/uapi/drm/i915_drm.h
>>>>+++ b/include/uapi/drm/i915_drm.h
>>>>@@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
>>>> #define I915_PMU_ENGINE_SEMA(class, instance) \
>>>>	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>>>>
>>>>-#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
>>>>+/*
>>>>+ * Top 4 bits of every non-engine counter are GT id.
>>>>+ */
>>>>+#define __I915_PMU_GT_SHIFT (60)
>>>
>>>REG_GENMASK64 or GENMASK_ULL would be nicer but of course we can't put in
>>>the uapi header, so ok.
>>
>>Yep.
>
>leaving as is.
>
>>
>>>>+
>>>>+#define ___I915_PMU_OTHER(gt, x) \
>>>>+	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>>>>+	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>>>>+
>>>>+#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>>>>
>>>> #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>>>> #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
>>>>@@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
>>>>
>>>> #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>>>>
>>>>+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
>>>>+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
>>>>+#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
>>>>+#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
>>>>+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
>>>>+
>>>> /* Each region is a minimum of 16k, and there are at most 255 of them.
>>>>  */
>>>> #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
>>>>--
>>>>2.36.1
>>>>
>>>
>>>Above comments are mostly nits so after addressing the above comments, this
>>>is:
>>>
>>>Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>>
>>Well you found some ugly bits (or I got confused, double check me 
>>please) so I'd say hold off with r-b just yet. Sadly it's on Umesh 
>>now to fix up my mess. :I
>
>I don't see anything wrong with the SECTION I marked above. As in, the 
>pmu_needs_timer and the sampling code for events that need to be 
>sampled. If you agree, I can spin the next revision.
>
>Thanks,
>Umesh
>
>>
>>Regards,
>>
>>Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-12 22:29   ` Dixit, Ashutosh
@ 2023-05-12 22:44     ` Umesh Nerlige Ramappa
  2023-05-12 23:20       ` Dixit, Ashutosh
  0 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-12 22:44 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
>On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh/Tvrtko,
>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We do not want to have timers per tile and waste CPU cycles and energy via
>> multiple wake-up sources, for a relatively un-important task of PMU
>> sampling, so keeping a single timer works well. But we also do not want
>> the first GT which goes idle to turn off the timer.
>>
>> Add some reference counting, via a mask of unparked GTs, to solve this.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
>>  drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
>>  2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index 2b63ee31e1b3..669a42e44082 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
>>	 * Signal sampling timer to stop if only engine events are enabled and
>>	 * GPU went idle.
>>	 */
>> -	pmu->timer_enabled = pmu_needs_timer(pmu, false);
>> +	pmu->unparked &= ~BIT(gt->info.id);
>> +	if (pmu->unparked == 0)
>> +		pmu->timer_enabled = pmu_needs_timer(pmu, false);
>>
>>	spin_unlock_irq(&pmu->lock);
>>  }
>> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
>>	/*
>>	 * Re-enable sampling timer when GPU goes active.
>>	 */
>> -	__i915_pmu_maybe_start_timer(pmu);
>> +	if (pmu->unparked == 0)
>> +		__i915_pmu_maybe_start_timer(pmu);
>> +
>> +	pmu->unparked |= BIT(gt->info.id);
>>
>>	spin_unlock_irq(&pmu->lock);
>>  }
>> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>>	 */
>>
>>	for_each_gt(gt, i915, i) {
>> +		if (!(pmu->unparked & BIT(i)))
>> +			continue;
>> +
>
>This is not correct. In this series we are at least sampling frequencies
>(calling frequency_sample) even when GT is parked. So these 3 lines should be
>deleted. engines_sample will get called and will return without doing
>anything if engine events are disabled.

Not sure I understand. This is checking pmu->'un'parked bits.

Thanks,
Umesh
>
>Thanks.
>--
>Ashutosh
>
>
>>		engines_sample(gt, period_ns);
>>
>>		if (i == 0) /* FIXME */
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>> index a686fd7ccedf..3a811266ac6a 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.h
>> +++ b/drivers/gpu/drm/i915/i915_pmu.h
>> @@ -76,6 +76,10 @@ struct i915_pmu {
>>	 * @lock: Lock protecting enable mask and ref count handling.
>>	 */
>>	spinlock_t lock;
>> +	/**
>> +	 * @unparked: GT unparked mask.
>> +	 */
>> +	unsigned int unparked;
>>	/**
>>	 * @timer: Timer for internal i915 PMU sampling.
>>	 */
>> --
>> 2.36.1
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-12 22:44     ` Umesh Nerlige Ramappa
@ 2023-05-12 23:20       ` Dixit, Ashutosh
  2023-05-12 23:44         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-12 23:20 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 12 May 2023 15:44:00 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > Hi Umesh/Tvrtko,
> >
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> We do not want to have timers per tile and waste CPU cycles and energy via
> >> multiple wake-up sources, for a relatively un-important task of PMU
> >> sampling, so keeping a single timer works well. But we also do not want
> >> the first GT which goes idle to turn off the timer.
> >>
> >> Add some reference counting, via a mask of unparked GTs, to solve this.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
> >>  drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
> >>  2 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> >> index 2b63ee31e1b3..669a42e44082 100644
> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
> >>	 * Signal sampling timer to stop if only engine events are enabled and
> >>	 * GPU went idle.
> >>	 */
> >> -	pmu->timer_enabled = pmu_needs_timer(pmu, false);
> >> +	pmu->unparked &= ~BIT(gt->info.id);
> >> +	if (pmu->unparked == 0)
> >> +		pmu->timer_enabled = pmu_needs_timer(pmu, false);
> >>
> >>	spin_unlock_irq(&pmu->lock);
> >>  }
> >> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
> >>	/*
> >>	 * Re-enable sampling timer when GPU goes active.
> >>	 */
> >> -	__i915_pmu_maybe_start_timer(pmu);
> >> +	if (pmu->unparked == 0)
> >> +		__i915_pmu_maybe_start_timer(pmu);
> >> +
> >> +	pmu->unparked |= BIT(gt->info.id);
> >>
> >>	spin_unlock_irq(&pmu->lock);
> >>  }
> >> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> >>	 */
> >>
> >>	for_each_gt(gt, i915, i) {
> >> +		if (!(pmu->unparked & BIT(i)))
> >> +			continue;
> >> +
> >
> > This is not correct. In this series we are at least sampling frequencies
> > (calling frequency_sample) even when GT is parked. So these 3 lines should be
> > deleted. engines_sample will get called and will return without doing
> > anything if engine events are disabled.
>
> Not sure I understand. This is checking pmu->'un'parked bits.

Sorry, my bad. Not "engines_sample will get called and will return without
doing anything if engine events are disabled" but "engines_sample will get
called and will return without doing anything if GT is not awake". This is
the same as the previous behavior before this series.

Umesh and I discussed this but writing this out in case Tvrtko takes a
look.

Thanks.
--
Ashutosh



> >
> >
> >>		engines_sample(gt, period_ns);
> >>
> >>		if (i == 0) /* FIXME */
> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> >> index a686fd7ccedf..3a811266ac6a 100644
> >> --- a/drivers/gpu/drm/i915/i915_pmu.h
> >> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> >> @@ -76,6 +76,10 @@ struct i915_pmu {
> >>	 * @lock: Lock protecting enable mask and ref count handling.
> >>	 */
> >>	spinlock_t lock;
> >> +	/**
> >> +	 * @unparked: GT unparked mask.
> >> +	 */
> >> +	unsigned int unparked;
> >>	/**
> >>	 * @timer: Timer for internal i915 PMU sampling.
> >>	 */
> >> --
> >> 2.36.1
> >>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-12 23:20       ` Dixit, Ashutosh
@ 2023-05-12 23:44         ` Umesh Nerlige Ramappa
  2023-05-15  9:52           ` Tvrtko Ursulin
  0 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-12 23:44 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, May 12, 2023 at 04:20:19PM -0700, Dixit, Ashutosh wrote:
>On Fri, 12 May 2023 15:44:00 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
>> > On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
>> >>
>> >
>> > Hi Umesh/Tvrtko,
>> >
>> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> >>
>> >> We do not want to have timers per tile and waste CPU cycles and energy via
>> >> multiple wake-up sources, for a relatively un-important task of PMU
>> >> sampling, so keeping a single timer works well. But we also do not want
>> >> the first GT which goes idle to turn off the timer.
>> >>
>> >> Add some reference counting, via a mask of unparked GTs, to solve this.
>> >>
>> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> >> ---
>> >>  drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
>> >>  drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
>> >>  2 files changed, 14 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> >> index 2b63ee31e1b3..669a42e44082 100644
>> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> >> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
>> >>	 * Signal sampling timer to stop if only engine events are enabled and
>> >>	 * GPU went idle.
>> >>	 */
>> >> -	pmu->timer_enabled = pmu_needs_timer(pmu, false);
>> >> +	pmu->unparked &= ~BIT(gt->info.id);
>> >> +	if (pmu->unparked == 0)
>> >> +		pmu->timer_enabled = pmu_needs_timer(pmu, false);
>> >>
>> >>	spin_unlock_irq(&pmu->lock);
>> >>  }
>> >> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
>> >>	/*
>> >>	 * Re-enable sampling timer when GPU goes active.
>> >>	 */
>> >> -	__i915_pmu_maybe_start_timer(pmu);
>> >> +	if (pmu->unparked == 0)
>> >> +		__i915_pmu_maybe_start_timer(pmu);
>> >> +
>> >> +	pmu->unparked |= BIT(gt->info.id);
>> >>
>> >>	spin_unlock_irq(&pmu->lock);
>> >>  }
>> >> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>> >>	 */
>> >>
>> >>	for_each_gt(gt, i915, i) {
>> >> +		if (!(pmu->unparked & BIT(i)))
>> >> +			continue;
>> >> +
>> >
>> > This is not correct. In this series we are at least sampling frequencies
>> > (calling frequency_sample) even when GT is parked. So these 3 lines should be
>> > deleted. engines_sample will get called and will return without doing
>> > anything if engine events are disabled.
>>
>> Not sure I understand. This is checking pmu->'un'parked bits.
>
>Sorry, my bad. Not "engines_sample will get called and will return without
>doing anything if engine events are disabled" but "engines_sample will get
>called and will return without doing anything if GT is not awake". This is
>the same as the previous behavior before this series.
>
>Umesh and I discussed this but writing this out in case Tvrtko takes a
>look.

Sounds good, Dropping the check here in the new revision.

Thanks,
Umesh
>
>Thanks.
>--
>Ashutosh
>
>
>
>> >
>> >
>> >>		engines_sample(gt, period_ns);
>> >>
>> >>		if (i == 0) /* FIXME */
>> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>> >> index a686fd7ccedf..3a811266ac6a 100644
>> >> --- a/drivers/gpu/drm/i915/i915_pmu.h
>> >> +++ b/drivers/gpu/drm/i915/i915_pmu.h
>> >> @@ -76,6 +76,10 @@ struct i915_pmu {
>> >>	 * @lock: Lock protecting enable mask and ref count handling.
>> >>	 */
>> >>	spinlock_t lock;
>> >> +	/**
>> >> +	 * @unparked: GT unparked mask.
>> >> +	 */
>> >> +	unsigned int unparked;
>> >>	/**
>> >>	 * @timer: Timer for internal i915 PMU sampling.
>> >>	 */
>> >> --
>> >> 2.36.1
>> >>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-12 20:57       ` Umesh Nerlige Ramappa
  2023-05-12 22:37         ` Umesh Nerlige Ramappa
@ 2023-05-13  1:09         ` Dixit, Ashutosh
  2023-05-15 10:10         ` Tvrtko Ursulin
  2 siblings, 0 replies; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-13  1:09 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 12 May 2023 13:57:59 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
> >
> > On 12/05/2023 02:08, Dixit, Ashutosh wrote:
> >> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
> >>>
> >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>
> >>> Reserve some bits in the counter config namespace which will carry the
> >>> tile id and prepare the code to handle this.
> >>>
> >>> No per tile counters have been added yet.
> >>>
> >>> v2:
> >>> - Fix checkpatch issues
> >>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
> >>> - Set MAX GTs to 4. Drop FIXME.
> >>>
> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >>> ---
> >>>  drivers/gpu/drm/i915/i915_pmu.c | 150 +++++++++++++++++++++++---------
> >>>  drivers/gpu/drm/i915/i915_pmu.h |   9 +-
> >>>  include/uapi/drm/i915_drm.h     |  17 +++-
> >>>  3 files changed, 129 insertions(+), 47 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> >>> index 669a42e44082..12b2f3169abf 100644
> >>> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >>> @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
> >>>	return config < __I915_PMU_OTHER(0);
> >>>  }
> >>>
> >>> +static unsigned int config_gt_id(const u64 config)
> >>> +{
> >>> +	return config >> __I915_PMU_GT_SHIFT;
> >>> +}
> >>> +
> >>> +static u64 config_counter(const u64 config)
> >>> +{
> >>> +	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
> >>
> >> ok, but another possibility:
> >>
> >>	return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT);
> >
> > It's not a register so no. :) GENMASK_ULL maybe but meh.
>
> leaving as is.
>
> >
> >>> +}
> >>> +
> >>>  static unsigned int other_bit(const u64 config)
> >>>  {
> >>>	unsigned int val;
> >>>
> >>> -	switch (config) {
> >>> +	switch (config_counter(config)) {
> >>>	case I915_PMU_ACTUAL_FREQUENCY:
> >>>		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
> >>>		break;
> >>> @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
> >>>		return -1;
> >>>	}
> >>>
> >>> -	return I915_ENGINE_SAMPLE_COUNT + val;
> >>> +	return I915_ENGINE_SAMPLE_COUNT +
> >>> +	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
> >>> +	       val;
> >>>  }
> >>>
> >>>  static unsigned int config_bit(const u64 config)
> >>>  {
> >>> -	if (is_engine_config(config))
> >>> +	if (is_engine_config(config)) {
> >>> +		GEM_BUG_ON(config_gt_id(config));
> >>
> >> This GEM_BUG_ON is not needed since:
> >>
> >>	static bool is_engine_config(u64 config)
> >>	{
> >>		return config < __I915_PMU_OTHER(0);
> >>	}
> >
> > True!
>
> dropping BUG_ON
>
> >
> >>> +
> >>>		return engine_config_sample(config);
> >>> -	else
> >>> +	} else {
> >>>		return other_bit(config);
> >>> +	}
> >>>  }
> >>>
> >>>  static u64 config_mask(u64 config)
> >>> @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
> >>>	return config_bit(event->attr.config);
> >>>  }
> >>>
> >>> +static u64 frequency_enabled_mask(void)
> >>> +{
> >>> +	unsigned int i;
> >>> +	u64 mask = 0;
> >>> +
> >>> +	for (i = 0; i < I915_PMU_MAX_GTS; i++)
> >>> +		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
> >>> +			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
> >>> +
> >>> +	return mask;
> >>> +}
> >>> +
> >>>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
> >>>  {
> >>>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> >>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
> >>>	 * Mask out all the ones which do not need the timer, or in
> >>>	 * other words keep all the ones that could need the timer.
> >>>	 */
> >>> -	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> >>> -		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
> >>> -		  ENGINE_SAMPLE_MASK;
> >>> +	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
> >>
> >> u32 enable & u64 frequency_enabled_mask
> >>
> >> ugly but ok I guess? Or change enable to u64?
>
> making pmu->enable u64 as well as other places where it is assigned to
> local variables.

Yes, that's the way to do it.

>
> >
> > Hmm.. yes very ugly. Could have been an accident which happened to work
> > because there is a single timer (not per tile).
>
> Happened to work because the frequency mask does not spill over to the
> upper 32 bits (even for multi tile).

Even with 4 tiles, I checked.

>
> --------------------- START_SECTION ----------------
> >
> > Similar issue in frequency_sampling_enabled too. Gt_id argument to it
> > seems pointless.
>
> Not sure why it's pointless. We need the gt_id to determine the right mask
> for that specific gt. If it's not enabled, then we just return without
> pm_get and async put (like you mention later).
> And this piece of code is called within for_each_gt.
>
> >
> > So I now think whole frequency_enabled_mask() is just pointless and
> > should be removed. And then pmu_needs_time code can stay as is. Possibly
> > add a config_mask_32 helper which ensures no bits in upper 32 bits are
> > returned.
> >
> > That is if we are happy for the frequency_sampling_enabled returning true
> > for all gts, regardless of which ones actually have frequency sampling
> > enabled.
> >
> > Or if we want to implement it as I probably have intended, we will need
> > to add some gt bits into pmu->enable. Maybe reserve top four same as with
> > config counters.
>
> Nope. What you have here works just fine. pmu->enable should not include
> any gt id info. gt_id[63:60] is only a concept for pmu config sent by user.
> config_mask and pmu->enable are i915 internal bookkeeping (bit masks) just
> to track what events need to be sampled.  The 'other' bit masks are a
> function of gt_id because we use gt_id to calculate a contiguous numerical
> value for these 'other' events. That's all. Once the numerical value is
> calculated, there is no need for gt_id because config_mask is
> BIT_ULL(numerical_value). Since the numerical values never exceeded 31
> (even for multi-gts), everything worked even with 32 bit pmu->enable.

Yeah, agree with Umesh, I also didn't follow what Tvrtko was saying here.

>
> >
> > In this case the config_mask needs to be updated to translate not just
> > the config counter into the pmu tracked event bits, but config counter gt
> > id into the pmu->enabled gt id.
> >
> > Sounds easily doable on a first thought.
>
> ------------------------ END_SECTION ----------------
>
>
> >>>
> >>>	/*
> >>>	 * When the GPU is idle per-engine counters do not need to be
> >>> @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt)
> >>>	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
> >>>  }
> >>>
> >>> +static unsigned int
> >>> +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
> >>> +{
> >>> +	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
> >>> +
> >>> +	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
> >>
> >> Does this GEM_BUG_ON need to be split up as follows:
> >>
> >>	GEM_BUG_ON(gt_id >= I915_PMU_MAX_GTS);
> >>	GEM_BUG_ON(sample >= __I915_NUM_PMU_SAMPLERS);
> >>
> >> Since that is what we really mean here isn't it?
> >
> > ARRAY_SIZE check seems the safest option to me, given it is defined as:
> >
> > sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
> >
> > What problem do you see here?

OK to leave as is.

> >
> >>> +
> >>> +	return idx;
> >>> +}
> >>> +
> >>> +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
> >>> +{
> >>> +	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
> >>> +}
> >>> +
> >>> +static void
> >>> +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
> >>> +{
> >>> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
> >>> +}
> >>> +
> >>> +static void
> >>> +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul)
> >>> +{
> >>> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul);
> >>> +}
> >>
> >> Gripe: I think this code should have per event data structures which store
> >> all information about a particular event. Rather than storing it in these
> >> arrays common to all events (and in bit-fields common to all events) which
> >> results in the kind of dance we have to do here. Anyway too big a change to
> >> make now but something to consider if we ever do this for xe.
> >
> > Could do a two dimensional array like:
> >
> > sample[I915_PMU_MAX_GTS][__I915_NUM_PMU_SAMPLERS];
> >
> > Any better? Honestly I don't remember if there was a special reason I
> > went for a flat array back then.
>
> Maybe we improve it in XE. I am looking for the shortest path to get this
> merged without any functional issues.

I like Tvrtko's 2-d array idea. Anyway Umesh you can leave this as is and I
will submit a follow on patch to fix this up.

>
> >
> >>
> >>> +
> >>>  static u64 get_rc6(struct intel_gt *gt)
> >>>  {
> >>>	struct drm_i915_private *i915 = gt->i915;
> >>> +	const unsigned int gt_id = gt->info.id;
> >>>	struct i915_pmu *pmu = &i915->pmu;
> >>>	unsigned long flags;
> >>>	bool awake = false;
> >>> @@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt)
> >>>	spin_lock_irqsave(&pmu->lock, flags);
> >>>
> >>>	if (awake) {
> >>> -		pmu->sample[__I915_SAMPLE_RC6].cur = val;
> >>> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
> >>>	} else {
> >>>		/*
> >>>		 * We think we are runtime suspended.
> >>> @@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt)
> >>>		 * on top of the last known real value, as the approximated RC6
> >>>		 * counter value.
> >>>		 */
> >>> -		val = ktime_since_raw(pmu->sleep_last);
> >>> -		val += pmu->sample[__I915_SAMPLE_RC6].cur;
> >>> +		val = ktime_since_raw(pmu->sleep_last[gt_id]);
> >>> +		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
> >>>	}
> >>>
> >>> -	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
> >>> -		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
> >>> +	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
> >>> +		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
> >>>	else
> >>> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
> >>> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
> >>>
> >>>	spin_unlock_irqrestore(&pmu->lock, flags);
> >>>
> >>> @@ -207,13 +260,20 @@ static u64 get_rc6(struct intel_gt *gt)
> >>>  static void init_rc6(struct i915_pmu *pmu)
> >>>  {
> >>>	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> >>> -	intel_wakeref_t wakeref;
> >>> +	struct intel_gt *gt;
> >>> +	unsigned int i;
> >>> +
> >>> +	for_each_gt(gt, i915, i) {
> >>> +		intel_wakeref_t wakeref;
> >>>
> >>> -	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
> >>> -		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
> >>> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
> >>> -					pmu->sample[__I915_SAMPLE_RC6].cur;
> >>> -		pmu->sleep_last = ktime_get_raw();
> >>> +		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
> >>> +			u64 val = __get_rc6(gt);
> >>> +
> >>> +			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
> >>> +			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
> >>> +				     val);
> >>> +			pmu->sleep_last[i] = ktime_get_raw();
> >>> +		}
> >>>	}
> >>>  }
> >>>
> >>> @@ -221,8 +281,8 @@ static void park_rc6(struct intel_gt *gt)
> >>>  {
> >>>	struct i915_pmu *pmu = &gt->i915->pmu;
> >>>
> >>> -	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
> >>> -	pmu->sleep_last = ktime_get_raw();
> >>> +	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
> >>> +	pmu->sleep_last[gt->info.id] = ktime_get_raw();
> >>>  }
> >>>
> >>>  static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
> >>> @@ -362,34 +422,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
> >>>	}
> >>>  }
> >>>
> >>> -static void
> >>> -add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
> >>> -{
> >>> -	sample->cur += mul_u32_u32(val, mul);
> >>> -}
> >>> -
> >>> -static bool frequency_sampling_enabled(struct i915_pmu *pmu)
> >>> +static bool
> >>> +frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
> >>>  {
> >>>	return pmu->enable &
> >>> -	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> >>> -		config_mask(I915_PMU_REQUESTED_FREQUENCY));
> >>> +	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
> >>> +		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
> >>
> >> Here again:
> >>
> >>	u32 pmu->enable & u64 config_mask
> >>
> >> Probably ok?
> >>
> >> And also in i915_pmu_enable() we have:
> >>
> >>	pmu->enable |= BIT_ULL(bit);
> >>
> >> So change pmu->enable to u64?
>
> Right, changing to u64
>
> >>
> >>>  }
> >>>
> >>>  static void
> >>>  frequency_sample(struct intel_gt *gt, unsigned int period_ns)
> >>>  {
> >>>	struct drm_i915_private *i915 = gt->i915;
> >>> +	const unsigned int gt_id = gt->info.id;
> >>>	struct i915_pmu *pmu = &i915->pmu;
> >>>	struct intel_rps *rps = &gt->rps;
> >>>
> >>> -	if (!frequency_sampling_enabled(pmu))
> >>> +	if (!frequency_sampling_enabled(pmu, gt_id))
> >>>		return;
> >>
> >> Pre-existing issue, but why do we need this check? This is already checked
> >> in the two individual checks for actual and requested freq below:
> >>
> >>	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id)))
> >>
> >>	and
> >>
> >>	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id)))
> >>
> >> So can we delete frequency_sampling_enabled()? Or is it there to avoid the
> >> overhead of intel_gt_pm_get_if_awake() (which doesn't seem to be much)?
> >
> > I think it was to avoid even getting an already active pm ref if
> > frequency events are not enabled. Timer could be running for instance if
> > only engine wait/sema is enabled. So yeah, just a little bit cheaper than
> > pm get + async put and avoid prolonging the delayed put for no
> > reason. (As the timer races with regular GT pm activities (see
> > mod_delayed_work in __intel_wakeref_put_last).)
>
> leaving as is.
>
> >
> >>
> >>>
> >>>	/* Report 0/0 (actual/requested) frequency while parked. */
> >>>	if (!intel_gt_pm_get_if_awake(gt))
> >>>		return;
> >>>
> >>> -	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> >>> +	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
> >>>		u32 val;
> >>>
> >>>		/*
> >>> @@ -405,12 +461,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
> >>>		if (!val)
> >>>			val = intel_gpu_freq(rps, rps->cur_freq);
> >>>
> >>> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
> >>> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
> >>>				val, period_ns / 1000);
> >>>	}
> >>>
> >>> -	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> >>> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
> >>> +	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
> >>> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
> >>>				intel_rps_get_requested_frequency(rps),
> >>>				period_ns / 1000);
> >>>	}
> >>> @@ -447,9 +503,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> >>>			continue;
> >>>
> >>>		engines_sample(gt, period_ns);
> >>> -
> >>> -		if (i == 0) /* FIXME */
> >>> -			frequency_sample(gt, period_ns);
> >>> +		frequency_sample(gt, period_ns);
> >>>	}
> >>>
> >>>	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
> >>> @@ -491,7 +545,12 @@ config_status(struct drm_i915_private *i915, u64 config)
> >>>  {
> >>>	struct intel_gt *gt = to_gt(i915);
> >>>
> >>> -	switch (config) {
> >>> +	unsigned int gt_id = config_gt_id(config);
> >>> +
> >>> +	if (gt_id)
> >>> +		return -ENOENT;
> >>
> >> This is just wrong. It is fixed in the next patch:
> >>
> >>	if (gt_id > max_gt_id)
> >>		return -ENOENT;
> >>
> >> But probably should be fixed in this patch itself. Or dropped from this
> >> patch and let it come in in Patch 6, since it's confusing. Though it
> >> probably belongs in this patch.
> >
> > Hmm my thinking was probably to reject gt > 0 in this patch since only
> > the last patch was supposed to be exposing the other tiles. Granted that
> > is not entirely true since this patch already makes access to them
> > available via i915_drm.h. Last patch only makes then discoverable via
> > sysfs.
> >
> > In this case yes, I'd pull in "gt_id > max_gt_id" into this patch. And
> > this hunk from the next patch too:
> >
> >	case I915_PMU_INTERRUPTS:
> > +		if (gt_id)
> > +			return -ENOENT;
> >
>
> pulling in the above snippets from patch 6 to patch 5
>
> >>
> >>> +
> >>> +	switch (config_counter(config)) {
> >>>	case I915_PMU_ACTUAL_FREQUENCY:
> >>>		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> >>>			/* Requires a mutex for sampling! */
> >>> @@ -599,22 +658,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
> >>>			val = engine->pmu.sample[sample].cur;
> >>>		}
> >>>	} else {
> >>> -		switch (event->attr.config) {
> >>> +		const unsigned int gt_id = config_gt_id(event->attr.config);
> >>> +		const u64 config = config_counter(event->attr.config);
> >>> +
> >>> +		switch (config) {
> >>>		case I915_PMU_ACTUAL_FREQUENCY:
> >>>			val =
> >>> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
> >>> +			   div_u64(read_sample(pmu, gt_id,
> >>> +					       __I915_SAMPLE_FREQ_ACT),
> >>>				   USEC_PER_SEC /* to MHz */);
> >>>			break;
> >>>		case I915_PMU_REQUESTED_FREQUENCY:
> >>>			val =
> >>> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
> >>> +			   div_u64(read_sample(pmu, gt_id,
> >>> +					       __I915_SAMPLE_FREQ_REQ),
> >>>				   USEC_PER_SEC /* to MHz */);
> >>>			break;
> >>>		case I915_PMU_INTERRUPTS:
> >>>			val = READ_ONCE(pmu->irq_count);
> >>>			break;
> >>>		case I915_PMU_RC6_RESIDENCY:
> >>> -			val = get_rc6(to_gt(i915));
> >>> +			val = get_rc6(i915->gt[gt_id]);
> >>>			break;
> >>>		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> >>>			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
> >>> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> >>> index 3a811266ac6a..d47846f21ddf 100644
> >>> --- a/drivers/gpu/drm/i915/i915_pmu.h
> >>> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> >>> @@ -38,13 +38,16 @@ enum {
> >>>	__I915_NUM_PMU_SAMPLERS
> >>>  };
> >>>
> >>> +#define I915_PMU_MAX_GTS (4)
> >>
> >> 4 or (4)? :-)
> >
> > Bike shed was strong with you on the day of review I see. :)
> >
> > I would rather get rid of this define altogether if we could use the
> > "normal" MAX_GT define. As I was saying earlier, I think this one was
> > here just because header dependencies were too convulted back then. Maybe
> > today things are better? Worth I try probably.
>
> dropping this and using I915_MAX_GTS
>
> >
> >>
> >>> +
> >>>  /*
> >>>   * How many different events we track in the global PMU mask.
> >>>   *
> >>>   * It is also used to know to needed number of event reference counters.
> >>>   */
> >>>  #define I915_PMU_MASK_BITS \
> >>> -	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
> >>> +	(I915_ENGINE_SAMPLE_COUNT + \
> >>> +	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
> >>>
> >>>  #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
> >>>
> >>> @@ -124,11 +127,11 @@ struct i915_pmu {
> >>>	 * Only global counters are held here, while the per-engine ones are in
> >>>	 * struct intel_engine_cs.
> >>>	 */
> >>> -	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
> >>> +	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
> >>>	/**
> >>>	 * @sleep_last: Last time GT parked for RC6 estimation.
> >>>	 */
> >>> -	ktime_t sleep_last;
> >>> +	ktime_t sleep_last[I915_PMU_MAX_GTS];
> >>>	/**
> >>>	 * @irq_count: Number of interrupts
> >>>	 *
> >>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> >>> index dba7c5a5b25e..d5ac1fdeb2b1 100644
> >>> --- a/include/uapi/drm/i915_drm.h
> >>> +++ b/include/uapi/drm/i915_drm.h
> >>> @@ -280,7 +280,16 @@ enum drm_i915_pmu_engine_sample {
> >>>  #define I915_PMU_ENGINE_SEMA(class, instance) \
> >>>	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
> >>>
> >>> -#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
> >>> +/*
> >>> + * Top 4 bits of every non-engine counter are GT id.
> >>> + */
> >>> +#define __I915_PMU_GT_SHIFT (60)
> >>
> >> REG_GENMASK64 or GENMASK_ULL would be nicer but of course we can't put in
> >> the uapi header, so ok.
> >
> > Yep.
>
> leaving as is.
>
> >
> >>> +
> >>> +#define ___I915_PMU_OTHER(gt, x) \
> >>> +	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
> >>> +	((__u64)(gt) << __I915_PMU_GT_SHIFT))
> >>> +
> >>> +#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
> >>>
> >>>  #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
> >>>  #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
> >>> @@ -290,6 +299,12 @@ enum drm_i915_pmu_engine_sample {
> >>>
> >>>  #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
> >>>
> >>> +#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
> >>> +#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
> >>> +#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
> >>> +#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
> >>> +#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
> >>> +
> >>>  /* Each region is a minimum of 16k, and there are at most 255 of them.
> >>>   */
> >>>  #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
> >>> --
> >>> 2.36.1
> >>>
> >>
> >> Above comments are mostly nits so after addressing the above comments, this
> >> is:
> >>
> >> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> >
> > Well you found some ugly bits (or I got confused, double check me please)
> > so I'd say hold off with r-b just yet. Sadly it's on Umesh now to fix up
> > my mess. :I
>
> I don't see anything wrong with the SECTION I marked above. As in, the
> pmu_needs_timer and the sampling code for events that need to be
> sampled. If you agree, I can spin the next revision.

So overall I am fine with Umesh's proposal mentioned above and Umesh you
can keep my R-b. So if Tvrtko is also ok please spin the next rev.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-13  1:55 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
@ 2023-05-13  1:55 ` Umesh Nerlige Ramappa
  2023-05-13  3:01   ` Dixit, Ashutosh
  0 siblings, 1 reply; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-13  1:55 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin, Ashutosh Dixit

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We do not want to have timers per tile and waste CPU cycles and energy via
multiple wake-up sources, for a relatively un-important task of PMU
sampling, so keeping a single timer works well. But we also do not want
the first GT which goes idle to turn off the timer.

Add some reference counting, via a mask of unparked GTs, to solve this.

v2: Drop the check for unparked in i915_sample (Ashutosh)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 9 +++++++--
 drivers/gpu/drm/i915/i915_pmu.h | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 2b63ee31e1b3..725b01b00775 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
 	 * Signal sampling timer to stop if only engine events are enabled and
 	 * GPU went idle.
 	 */
-	pmu->timer_enabled = pmu_needs_timer(pmu, false);
+	pmu->unparked &= ~BIT(gt->info.id);
+	if (pmu->unparked == 0)
+		pmu->timer_enabled = pmu_needs_timer(pmu, false);
 
 	spin_unlock_irq(&pmu->lock);
 }
@@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
 	/*
 	 * Re-enable sampling timer when GPU goes active.
 	 */
-	__i915_pmu_maybe_start_timer(pmu);
+	if (pmu->unparked == 0)
+		__i915_pmu_maybe_start_timer(pmu);
+
+	pmu->unparked |= BIT(gt->info.id);
 
 	spin_unlock_irq(&pmu->lock);
 }
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index a686fd7ccedf..3a811266ac6a 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -76,6 +76,10 @@ struct i915_pmu {
 	 * @lock: Lock protecting enable mask and ref count handling.
 	 */
 	spinlock_t lock;
+	/**
+	 * @unparked: GT unparked mask.
+	 */
+	unsigned int unparked;
 	/**
 	 * @timer: Timer for internal i915 PMU sampling.
 	 */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-13  1:55 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
@ 2023-05-13  3:01   ` Dixit, Ashutosh
  0 siblings, 0 replies; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-13  3:01 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 12 May 2023 18:55:43 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> We do not want to have timers per tile and waste CPU cycles and energy via
> multiple wake-up sources, for a relatively un-important task of PMU
> sampling, so keeping a single timer works well. But we also do not want
> the first GT which goes idle to turn off the timer.
>
> Add some reference counting, via a mask of unparked GTs, to solve this.
>
> v2: Drop the check for unparked in i915_sample (Ashutosh)

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 9 +++++++--
>  drivers/gpu/drm/i915/i915_pmu.h | 4 ++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 2b63ee31e1b3..725b01b00775 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
>	 * Signal sampling timer to stop if only engine events are enabled and
>	 * GPU went idle.
>	 */
> -	pmu->timer_enabled = pmu_needs_timer(pmu, false);
> +	pmu->unparked &= ~BIT(gt->info.id);
> +	if (pmu->unparked == 0)
> +		pmu->timer_enabled = pmu_needs_timer(pmu, false);
>
>	spin_unlock_irq(&pmu->lock);
>  }
> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
>	/*
>	 * Re-enable sampling timer when GPU goes active.
>	 */
> -	__i915_pmu_maybe_start_timer(pmu);
> +	if (pmu->unparked == 0)
> +		__i915_pmu_maybe_start_timer(pmu);
> +
> +	pmu->unparked |= BIT(gt->info.id);
>
>	spin_unlock_irq(&pmu->lock);
>  }
> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> index a686fd7ccedf..3a811266ac6a 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.h
> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> @@ -76,6 +76,10 @@ struct i915_pmu {
>	 * @lock: Lock protecting enable mask and ref count handling.
>	 */
>	spinlock_t lock;
> +	/**
> +	 * @unparked: GT unparked mask.
> +	 */
> +	unsigned int unparked;
>	/**
>	 * @timer: Timer for internal i915 PMU sampling.
>	 */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-15  6:44 [Intel-gfx] [PATCH v4 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
@ 2023-05-15  6:44 ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 45+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-05-15  6:44 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We do not want to have timers per tile and waste CPU cycles and energy via
multiple wake-up sources, for a relatively un-important task of PMU
sampling, so keeping a single timer works well. But we also do not want
the first GT which goes idle to turn off the timer.

Add some reference counting, via a mask of unparked GTs, to solve this.

v2: Drop the check for unparked in i915_sample (Ashutosh)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 9 +++++++--
 drivers/gpu/drm/i915/i915_pmu.h | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 2b63ee31e1b3..725b01b00775 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
 	 * Signal sampling timer to stop if only engine events are enabled and
 	 * GPU went idle.
 	 */
-	pmu->timer_enabled = pmu_needs_timer(pmu, false);
+	pmu->unparked &= ~BIT(gt->info.id);
+	if (pmu->unparked == 0)
+		pmu->timer_enabled = pmu_needs_timer(pmu, false);
 
 	spin_unlock_irq(&pmu->lock);
 }
@@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
 	/*
 	 * Re-enable sampling timer when GPU goes active.
 	 */
-	__i915_pmu_maybe_start_timer(pmu);
+	if (pmu->unparked == 0)
+		__i915_pmu_maybe_start_timer(pmu);
+
+	pmu->unparked |= BIT(gt->info.id);
 
 	spin_unlock_irq(&pmu->lock);
 }
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index a686fd7ccedf..3a811266ac6a 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -76,6 +76,10 @@ struct i915_pmu {
 	 * @lock: Lock protecting enable mask and ref count handling.
 	 */
 	spinlock_t lock;
+	/**
+	 * @unparked: GT unparked mask.
+	 */
+	unsigned int unparked;
 	/**
 	 * @timer: Timer for internal i915 PMU sampling.
 	 */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-12 23:44         ` Umesh Nerlige Ramappa
@ 2023-05-15  9:52           ` Tvrtko Ursulin
  2023-05-15 21:24             ` Dixit, Ashutosh
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-15  9:52 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Dixit, Ashutosh; +Cc: intel-gfx


On 13/05/2023 00:44, Umesh Nerlige Ramappa wrote:
> On Fri, May 12, 2023 at 04:20:19PM -0700, Dixit, Ashutosh wrote:
>> On Fri, 12 May 2023 15:44:00 -0700, Umesh Nerlige Ramappa wrote:
>>>
>>> On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
>>> > On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
>>> >>
>>> >
>>> > Hi Umesh/Tvrtko,
>>> >
>>> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> >>
>>> >> We do not want to have timers per tile and waste CPU cycles and 
>>> energy via
>>> >> multiple wake-up sources, for a relatively un-important task of PMU
>>> >> sampling, so keeping a single timer works well. But we also do not 
>>> want
>>> >> the first GT which goes idle to turn off the timer.
>>> >>
>>> >> Add some reference counting, via a mask of unparked GTs, to solve 
>>> this.
>>> >>
>>> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> >> Signed-off-by: Umesh Nerlige Ramappa 
>>> <umesh.nerlige.ramappa@intel.com>
>>> >> ---
>>> >>  drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
>>> >>  drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
>>> >>  2 files changed, 14 insertions(+), 2 deletions(-)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>> >> index 2b63ee31e1b3..669a42e44082 100644
>>> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> >> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
>>> >>     * Signal sampling timer to stop if only engine events are 
>>> enabled and
>>> >>     * GPU went idle.
>>> >>     */
>>> >> -    pmu->timer_enabled = pmu_needs_timer(pmu, false);
>>> >> +    pmu->unparked &= ~BIT(gt->info.id);
>>> >> +    if (pmu->unparked == 0)
>>> >> +        pmu->timer_enabled = pmu_needs_timer(pmu, false);
>>> >>
>>> >>    spin_unlock_irq(&pmu->lock);
>>> >>  }
>>> >> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
>>> >>    /*
>>> >>     * Re-enable sampling timer when GPU goes active.
>>> >>     */
>>> >> -    __i915_pmu_maybe_start_timer(pmu);
>>> >> +    if (pmu->unparked == 0)
>>> >> +        __i915_pmu_maybe_start_timer(pmu);
>>> >> +
>>> >> +    pmu->unparked |= BIT(gt->info.id);
>>> >>
>>> >>    spin_unlock_irq(&pmu->lock);
>>> >>  }
>>> >> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct 
>>> hrtimer *hrtimer)
>>> >>     */
>>> >>
>>> >>    for_each_gt(gt, i915, i) {
>>> >> +        if (!(pmu->unparked & BIT(i)))
>>> >> +            continue;
>>> >> +
>>> >
>>> > This is not correct. In this series we are at least sampling 
>>> frequencies
>>> > (calling frequency_sample) even when GT is parked. So these 3 lines 
>>> should be
>>> > deleted. engines_sample will get called and will return without doing
>>> > anything if engine events are disabled.
>>>
>>> Not sure I understand. This is checking pmu->'un'parked bits.
>>
>> Sorry, my bad. Not "engines_sample will get called and will return 
>> without
>> doing anything if engine events are disabled" but "engines_sample will 
>> get
>> called and will return without doing anything if GT is not awake". 
>> This is
>> the same as the previous behavior before this series.
>>
>> Umesh and I discussed this but writing this out in case Tvrtko takes a
>> look.
> 
> Sounds good, Dropping the check here in the new revision.

I think it is safe to not have the check, but I didn't quite understand 
the "this is not correct" part. I can only see the argument that it 
could be redundant, not that it is incorrect.

In which case I think it should better stay since it is way more 
efficient, given this gets called at 200Hz, than the *atomic* 
atomic_inc_not_zero (from intel_wakeref_get_if_active).

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-12 20:57       ` Umesh Nerlige Ramappa
  2023-05-12 22:37         ` Umesh Nerlige Ramappa
  2023-05-13  1:09         ` Dixit, Ashutosh
@ 2023-05-15 10:10         ` Tvrtko Ursulin
  2023-05-15 22:07           ` Dixit, Ashutosh
  2 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-15 10:10 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote:
> On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
>>
>> On 12/05/2023 02:08, Dixit, Ashutosh wrote:
>>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>>>>
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Reserve some bits in the counter config namespace which will carry the
>>>> tile id and prepare the code to handle this.
>>>>
>>>> No per tile counters have been added yet.
>>>>
>>>> v2:
>>>> - Fix checkpatch issues
>>>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
>>>> - Set MAX GTs to 4. Drop FIXME.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

8<

>>>> +static u64 frequency_enabled_mask(void)
>>>> +{
>>>> +    unsigned int i;
>>>> +    u64 mask = 0;
>>>> +
>>>> +    for (i = 0; i < I915_PMU_MAX_GTS; i++)
>>>> +        mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>>>> +            config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>>>> +
>>>> +    return mask;
>>>> +}
>>>> +
>>>>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>>  {
>>>>     struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), 
>>>> pmu);
>>>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu 
>>>> *pmu, bool gpu_active)
>>>>      * Mask out all the ones which do not need the timer, or in
>>>>      * other words keep all the ones that could need the timer.
>>>>      */
>>>> -    enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>> -          config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>>>> -          ENGINE_SAMPLE_MASK;
>>>> +    enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>>>
>>> u32 enable & u64 frequency_enabled_mask
>>>
>>> ugly but ok I guess? Or change enable to u64?
> 
> making pmu->enable u64 as well as other places where it is assigned to 
> local variables.
> 
>>
>> Hmm.. yes very ugly. Could have been an accident which happened to 
>> work because there is a single timer (not per tile).
> 
> Happened to work because the frequency mask does not spill over to the 
> upper 32 bits (even for multi tile).
> 
> --------------------- START_SECTION ----------------
>>
>> Similar issue in frequency_sampling_enabled too. Gt_id argument to it 
>> seems pointless.
> 
> Not sure why it's pointless. We need the gt_id to determine the right 
> mask for that specific gt. If it's not enabled, then we just return 
> without pm_get and async put (like you mention later).
> And this piece of code is called within for_each_gt.

I think I got a little confused cross referencing the code and patches 
last week and did not mentally see all the changes.

Because the hunk in other_bit() is correctly adding support for per gt bits.

The layout of pmu->enable ends up like this:

bits  0 -  2: engine events
bits  3 -  5: gt0 other events
bits  6 -  8: gt1 other events
bits  9 - 11: gt2 other events
bits 12 - 14: gt3 other events

>> So I now think whole frequency_enabled_mask() is just pointless and 
>> should be removed. And then pmu_needs_time code can stay as is. 
>> Possibly add a config_mask_32 helper which ensures no bits in upper 32 
>> bits are returned.
>>
>> That is if we are happy for the frequency_sampling_enabled returning 
>> true for all gts, regardless of which ones actually have frequency 
>> sampling enabled.
>>
>> Or if we want to implement it as I probably have intended, we will 
>> need to add some gt bits into pmu->enable. Maybe reserve top four same 
>> as with config counters.
> 
> Nope. What you have here works just fine. pmu->enable should not include 
> any gt id info. gt_id[63:60] is only a concept for pmu config sent by 
> user.  config_mask and pmu->enable are i915 internal bookkeeping (bit 
> masks) just to track what events need to be sampled.  The 'other' bit 
> masks are a function of gt_id because we use gt_id to calculate a 
> contiguous numerical value for these 'other' events. That's all. Once 
> the numerical value is calculated, there is no need for gt_id because 
> config_mask is BIT_ULL(numerical_value). Since the numerical values 
> never exceeded 31 (even for multi-gts), everything worked even with 32 
> bit pmu->enable.

Yep.

So question then is why make pmu->enable u64?

Instead frequency_enabled_mask() should be made u32 since the bitwise or 
composition of config_masks() is guaranteed to fit.

At most it can have an internal u64 for the mask, assert upper_32_bits 
are zero and return lower_32_bits.

Did I get it right this time round? :)

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-15  9:52           ` Tvrtko Ursulin
@ 2023-05-15 21:24             ` Dixit, Ashutosh
  2023-05-16  7:12               ` Tvrtko Ursulin
  0 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-15 21:24 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, 15 May 2023 02:52:35 -0700, Tvrtko Ursulin wrote:
>
> On 13/05/2023 00:44, Umesh Nerlige Ramappa wrote:
> > On Fri, May 12, 2023 at 04:20:19PM -0700, Dixit, Ashutosh wrote:
> >> On Fri, 12 May 2023 15:44:00 -0700, Umesh Nerlige Ramappa wrote:
> >>>
> >>> On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
> >>> > On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
> >>> >>
> >>> >
> >>> > Hi Umesh/Tvrtko,
> >>> >
> >>> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> >>
> >>> >> We do not want to have timers per tile and waste CPU cycles and
> >>> energy via
> >>> >> multiple wake-up sources, for a relatively un-important task of PMU
> >>> >> sampling, so keeping a single timer works well. But we also do not
> >>> want
> >>> >> the first GT which goes idle to turn off the timer.
> >>> >>
> >>> >> Add some reference counting, via a mask of unparked GTs, to solve
> >>> this.
> >>> >>
> >>> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> >> Signed-off-by: Umesh Nerlige Ramappa
> >>> <umesh.nerlige.ramappa@intel.com>
> >>> >> ---
> >>> >>  drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
> >>> >>  drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
> >>> >>  2 files changed, 14 insertions(+), 2 deletions(-)
> >>> >>
> >>> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
> >>> b/drivers/gpu/drm/i915/i915_pmu.c
> >>> >> index 2b63ee31e1b3..669a42e44082 100644
> >>> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >>> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >>> >> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
> >>> >>     * Signal sampling timer to stop if only engine events are
> >>> enabled and
> >>> >>     * GPU went idle.
> >>> >>     */
> >>> >> -    pmu->timer_enabled = pmu_needs_timer(pmu, false);
> >>> >> +    pmu->unparked &= ~BIT(gt->info.id);
> >>> >> +    if (pmu->unparked == 0)
> >>> >> +        pmu->timer_enabled = pmu_needs_timer(pmu, false);
> >>> >>
> >>> >>    spin_unlock_irq(&pmu->lock);
> >>> >>  }
> >>> >> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
> >>> >>    /*
> >>> >>     * Re-enable sampling timer when GPU goes active.
> >>> >>     */
> >>> >> -    __i915_pmu_maybe_start_timer(pmu);
> >>> >> +    if (pmu->unparked == 0)
> >>> >> +        __i915_pmu_maybe_start_timer(pmu);
> >>> >> +
> >>> >> +    pmu->unparked |= BIT(gt->info.id);
> >>> >>
> >>> >>    spin_unlock_irq(&pmu->lock);
> >>> >>  }
> >>> >> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct
> >>> hrtimer *hrtimer)
> >>> >>     */
> >>> >>
> >>> >>    for_each_gt(gt, i915, i) {
> >>> >> +        if (!(pmu->unparked & BIT(i)))
> >>> >> +            continue;
> >>> >> +
> >>> >
> >>> > This is not correct. In this series we are at least sampling
> >>> frequencies
> >>> > (calling frequency_sample) even when GT is parked. So these 3 lines
> >>> should be
> >>> > deleted. engines_sample will get called and will return without doing
> >>> > anything if engine events are disabled.
> >>>
> >>> Not sure I understand. This is checking pmu->'un'parked bits.
> >>
> >> Sorry, my bad. Not "engines_sample will get called and will return
> >> without
> >> doing anything if engine events are disabled" but "engines_sample will
> >> get
> >> called and will return without doing anything if GT is not awake". This
> >> is
> >> the same as the previous behavior before this series.
> >>
> >> Umesh and I discussed this but writing this out in case Tvrtko takes a
> >> look.
> >
> > Sounds good, Dropping the check here in the new revision.

Hi Tvrtko,

> I think it is safe to not have the check, but I didn't quite understand the
> "this is not correct" part. I can only see the argument that it could be
> redundant, not that it is incorrect.

I said that it looks incorrect to me because in this series we are still
sampling freq when gt is parked and we would be skipping that if we
included:
		if (!(pmu->unparked & BIT(i)))
			continue;

> In which case I think it should better stay since it is way more efficient,
> given this gets called at 200Hz, than the *atomic* atomic_inc_not_zero
> (from intel_wakeref_get_if_active).

Where efficiency goes, when we merge the patch below (I have a v2 based on
your suggestion but I am waiting till Umesh's series gets merged):

https://patchwork.freedesktop.org/series/117658/

this will turn off the timer itself which will be even more
efficient. Rather than use the above code where the timer is running and
then we skip. So after the link above is merged the above code will be
truly redundant. That was a second reason why I said delete it.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-15 10:10         ` Tvrtko Ursulin
@ 2023-05-15 22:07           ` Dixit, Ashutosh
  2023-05-16  8:35             ` Tvrtko Ursulin
  0 siblings, 1 reply; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-15 22:07 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, 15 May 2023 03:10:56 -0700, Tvrtko Ursulin wrote:
>

Hi Tvrtko,

> On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote:
> > On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 12/05/2023 02:08, Dixit, Ashutosh wrote:
> >>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
> >>>>
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> Reserve some bits in the counter config namespace which will carry the
> >>>> tile id and prepare the code to handle this.
> >>>>
> >>>> No per tile counters have been added yet.
> >>>>
> >>>> v2:
> >>>> - Fix checkpatch issues
> >>>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
> >>>> - Set MAX GTs to 4. Drop FIXME.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
> 8<
>
> >>>> +static u64 frequency_enabled_mask(void)
> >>>> +{
> >>>> +    unsigned int i;
> >>>> +    u64 mask = 0;
> >>>> +
> >>>> +    for (i = 0; i < I915_PMU_MAX_GTS; i++)
> >>>> +        mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
> >>>> +            config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
> >>>> +
> >>>> +    return mask;
> >>>> +}
> >>>> +
> >>>>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
> >>>>  {
> >>>>     struct drm_i915_private *i915 = container_of(pmu, typeof(*i915),
> >>>> pmu);
> >>>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu,
> >>>> bool gpu_active)
> >>>>      * Mask out all the ones which do not need the timer, or in
> >>>>      * other words keep all the ones that could need the timer.
> >>>>      */
> >>>> -    enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> >>>> -          config_mask(I915_PMU_REQUESTED_FREQUENCY) |
> >>>> -          ENGINE_SAMPLE_MASK;
> >>>> +    enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
> >>>
> >>> u32 enable & u64 frequency_enabled_mask
> >>>
> >>> ugly but ok I guess? Or change enable to u64?
> >
> > making pmu->enable u64 as well as other places where it is assigned to
> > local variables.
> >
> >>
> >> Hmm.. yes very ugly. Could have been an accident which happened to work
> >> because there is a single timer (not per tile).
> >
> > Happened to work because the frequency mask does not spill over to the
> > upper 32 bits (even for multi tile).
> >
> > --------------------- START_SECTION ----------------
> >>
> >> Similar issue in frequency_sampling_enabled too. Gt_id argument to it
> >> seems pointless.
> >
> > Not sure why it's pointless. We need the gt_id to determine the right
> > mask for that specific gt. If it's not enabled, then we just return
> > without pm_get and async put (like you mention later).
> > And this piece of code is called within for_each_gt.
>
> I think I got a little confused cross referencing the code and patches last
> week and did not mentally see all the changes.
>
> Because the hunk in other_bit() is correctly adding support for per gt bits.
>
> The layout of pmu->enable ends up like this:
>
> bits  0 -  2: engine events
> bits  3 -  5: gt0 other events
> bits  6 -  8: gt1 other events
> bits  9 - 11: gt2 other events
> bits 12 - 14: gt3 other events

Correct.

>
> >> So I now think whole frequency_enabled_mask() is just pointless and
> >> should be removed. And then pmu_needs_time code can stay as is. Possibly
> >> add a config_mask_32 helper which ensures no bits in upper 32 bits are
> >> returned.
> >>
> >> That is if we are happy for the frequency_sampling_enabled returning
> >> true for all gts, regardless of which ones actually have frequency
> >> sampling enabled.
> >>
> >> Or if we want to implement it as I probably have intended, we will need
> >> to add some gt bits into pmu->enable. Maybe reserve top four same as
> >> with config counters.
> >
> > Nope. What you have here works just fine. pmu->enable should not include
> > any gt id info. gt_id[63:60] is only a concept for pmu config sent by
> > user.  config_mask and pmu->enable are i915 internal bookkeeping (bit
> > masks) just to track what events need to be sampled.  The 'other' bit
> > masks are a function of gt_id because we use gt_id to calculate a
> > contiguous numerical value for these 'other' events. That's all. Once the
> > numerical value is calculated, there is no need for gt_id because
> > config_mask is BIT_ULL(numerical_value). Since the numerical values never
> > exceeded 31 (even for multi-gts), everything worked even with 32 bit
> > pmu->enable.
>
> Yep.
>
> So question then is why make pmu->enable u64?

The only reason was simplicity, since a lot of the existing code already
assumes u64.

E.g. if we keep pmu->enable u32, we should have to do the following:

* Change config_mask() return type to u32 (in frequency_sampling_enabled(),
  we have 'pmu->enable & config_mask()')
* Change frequency_enabled_mask() return type to u32 (again uses
  config_mask() so if we change config_mask() to u32 we change return type
  here too)
* In i915_pmu_enable(), change 'pmu->enable |= BIT_ULL(bit)' to
  'pmu->enable |= BIT(bit)'

So yes, if we think we should be using pmu->enable u32, let's change this
to be consistent everywhere.

> Instead frequency_enabled_mask() should be made u32 since the bitwise or
> composition of config_masks() is guaranteed to fit.
>
> At most it can have an internal u64 for the mask, assert upper_32_bits are
> zero and return lower_32_bits.
>
> Did I get it right this time round? :)

Yes, though we'd have to make the config_mask() type change above to make
frequency_enabled_mask() u32. Or we can just keep everything u64. Let's
decide one way or the other and close this. It seems Tvrtko is leaning
towards making pmu->enable and frequency_enabled_mask() u32?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-15 21:24             ` Dixit, Ashutosh
@ 2023-05-16  7:12               ` Tvrtko Ursulin
  2023-05-16 16:29                 ` Dixit, Ashutosh
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-16  7:12 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx


On 15/05/2023 22:24, Dixit, Ashutosh wrote:
> On Mon, 15 May 2023 02:52:35 -0700, Tvrtko Ursulin wrote:
>>
>> On 13/05/2023 00:44, Umesh Nerlige Ramappa wrote:
>>> On Fri, May 12, 2023 at 04:20:19PM -0700, Dixit, Ashutosh wrote:
>>>> On Fri, 12 May 2023 15:44:00 -0700, Umesh Nerlige Ramappa wrote:
>>>>>
>>>>> On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
>>>>>> On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
>>>>>>>
>>>>>>
>>>>>> Hi Umesh/Tvrtko,
>>>>>>
>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>
>>>>>>> We do not want to have timers per tile and waste CPU cycles and
>>>>> energy via
>>>>>>> multiple wake-up sources, for a relatively un-important task of PMU
>>>>>>> sampling, so keeping a single timer works well. But we also do not
>>>>> want
>>>>>>> the first GT which goes idle to turn off the timer.
>>>>>>>
>>>>>>> Add some reference counting, via a mask of unparked GTs, to solve
>>>>> this.
>>>>>>>
>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> Signed-off-by: Umesh Nerlige Ramappa
>>>>> <umesh.nerlige.ramappa@intel.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
>>>>>>>    drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
>>>>>>>    2 files changed, 14 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
>>>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>>>>>> index 2b63ee31e1b3..669a42e44082 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>>>>>> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
>>>>>>>       * Signal sampling timer to stop if only engine events are
>>>>> enabled and
>>>>>>>       * GPU went idle.
>>>>>>>       */
>>>>>>> -    pmu->timer_enabled = pmu_needs_timer(pmu, false);
>>>>>>> +    pmu->unparked &= ~BIT(gt->info.id);
>>>>>>> +    if (pmu->unparked == 0)
>>>>>>> +        pmu->timer_enabled = pmu_needs_timer(pmu, false);
>>>>>>>
>>>>>>>      spin_unlock_irq(&pmu->lock);
>>>>>>>    }
>>>>>>> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
>>>>>>>      /*
>>>>>>>       * Re-enable sampling timer when GPU goes active.
>>>>>>>       */
>>>>>>> -    __i915_pmu_maybe_start_timer(pmu);
>>>>>>> +    if (pmu->unparked == 0)
>>>>>>> +        __i915_pmu_maybe_start_timer(pmu);
>>>>>>> +
>>>>>>> +    pmu->unparked |= BIT(gt->info.id);
>>>>>>>
>>>>>>>      spin_unlock_irq(&pmu->lock);
>>>>>>>    }
>>>>>>> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct
>>>>> hrtimer *hrtimer)
>>>>>>>       */
>>>>>>>
>>>>>>>      for_each_gt(gt, i915, i) {
>>>>>>> +        if (!(pmu->unparked & BIT(i)))
>>>>>>> +            continue;
>>>>>>> +
>>>>>>
>>>>>> This is not correct. In this series we are at least sampling
>>>>> frequencies
>>>>>> (calling frequency_sample) even when GT is parked. So these 3 lines
>>>>> should be
>>>>>> deleted. engines_sample will get called and will return without doing
>>>>>> anything if engine events are disabled.
>>>>>
>>>>> Not sure I understand. This is checking pmu->'un'parked bits.
>>>>
>>>> Sorry, my bad. Not "engines_sample will get called and will return
>>>> without
>>>> doing anything if engine events are disabled" but "engines_sample will
>>>> get
>>>> called and will return without doing anything if GT is not awake". This
>>>> is
>>>> the same as the previous behavior before this series.
>>>>
>>>> Umesh and I discussed this but writing this out in case Tvrtko takes a
>>>> look.
>>>
>>> Sounds good, Dropping the check here in the new revision.
> 
> Hi Tvrtko,
> 
>> I think it is safe to not have the check, but I didn't quite understand the
>> "this is not correct" part. I can only see the argument that it could be
>> redundant, not that it is incorrect.
> 
> I said that it looks incorrect to me because in this series we are still
> sampling freq when gt is parked and we would be skipping that if we
> included:
> 		if (!(pmu->unparked & BIT(i)))
> 			continue;

Ah okay. We concluded in your upstream patch that looks like an omission.

>> In which case I think it should better stay since it is way more efficient,
>> given this gets called at 200Hz, than the *atomic* atomic_inc_not_zero
>> (from intel_wakeref_get_if_active).
> 
> Where efficiency goes, when we merge the patch below (I have a v2 based on
> your suggestion but I am waiting till Umesh's series gets merged):
> 
> https://patchwork.freedesktop.org/series/117658/
> 
> this will turn off the timer itself which will be even more
> efficient. Rather than use the above code where the timer is running and
> then we skip. So after the link above is merged the above code will be
> truly redundant. That was a second reason why I said delete it.

On multi-tile where not all tiles are being looked at it still pays to 
avoid the atomic check. It doesn't apply to tools like intel_gpu_top, 
which monitor all tiles, but still I think there isn't any harm in 
having the fast check.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-05-15 22:07           ` Dixit, Ashutosh
@ 2023-05-16  8:35             ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2023-05-16  8:35 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx


On 15/05/2023 23:07, Dixit, Ashutosh wrote:
> On Mon, 15 May 2023 03:10:56 -0700, Tvrtko Ursulin wrote:
>>
> 
> Hi Tvrtko,
> 
>> On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote:
>>> On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 12/05/2023 02:08, Dixit, Ashutosh wrote:
>>>>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>>>>>>
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> Reserve some bits in the counter config namespace which will carry the
>>>>>> tile id and prepare the code to handle this.
>>>>>>
>>>>>> No per tile counters have been added yet.
>>>>>>
>>>>>> v2:
>>>>>> - Fix checkpatch issues
>>>>>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
>>>>>> - Set MAX GTs to 4. Drop FIXME.
>>>>>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>
>> 8<
>>
>>>>>> +static u64 frequency_enabled_mask(void)
>>>>>> +{
>>>>>> +    unsigned int i;
>>>>>> +    u64 mask = 0;
>>>>>> +
>>>>>> +    for (i = 0; i < I915_PMU_MAX_GTS; i++)
>>>>>> +        mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>>>>>> +            config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>>>>>> +
>>>>>> +    return mask;
>>>>>> +}
>>>>>> +
>>>>>>   static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>>>>   {
>>>>>>      struct drm_i915_private *i915 = container_of(pmu, typeof(*i915),
>>>>>> pmu);
>>>>>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu,
>>>>>> bool gpu_active)
>>>>>>       * Mask out all the ones which do not need the timer, or in
>>>>>>       * other words keep all the ones that could need the timer.
>>>>>>       */
>>>>>> -    enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>>>> -          config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>>>>>> -          ENGINE_SAMPLE_MASK;
>>>>>> +    enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>>>>>
>>>>> u32 enable & u64 frequency_enabled_mask
>>>>>
>>>>> ugly but ok I guess? Or change enable to u64?
>>>
>>> making pmu->enable u64 as well as other places where it is assigned to
>>> local variables.
>>>
>>>>
>>>> Hmm.. yes very ugly. Could have been an accident which happened to work
>>>> because there is a single timer (not per tile).
>>>
>>> Happened to work because the frequency mask does not spill over to the
>>> upper 32 bits (even for multi tile).
>>>
>>> --------------------- START_SECTION ----------------
>>>>
>>>> Similar issue in frequency_sampling_enabled too. Gt_id argument to it
>>>> seems pointless.
>>>
>>> Not sure why it's pointless. We need the gt_id to determine the right
>>> mask for that specific gt. If it's not enabled, then we just return
>>> without pm_get and async put (like you mention later).
>>> And this piece of code is called within for_each_gt.
>>
>> I think I got a little confused cross referencing the code and patches last
>> week and did not mentally see all the changes.
>>
>> Because the hunk in other_bit() is correctly adding support for per gt bits.
>>
>> The layout of pmu->enable ends up like this:
>>
>> bits  0 -  2: engine events
>> bits  3 -  5: gt0 other events
>> bits  6 -  8: gt1 other events
>> bits  9 - 11: gt2 other events
>> bits 12 - 14: gt3 other events
> 
> Correct.
> 
>>
>>>> So I now think whole frequency_enabled_mask() is just pointless and
>>>> should be removed. And then pmu_needs_time code can stay as is. Possibly
>>>> add a config_mask_32 helper which ensures no bits in upper 32 bits are
>>>> returned.
>>>>
>>>> That is if we are happy for the frequency_sampling_enabled returning
>>>> true for all gts, regardless of which ones actually have frequency
>>>> sampling enabled.
>>>>
>>>> Or if we want to implement it as I probably have intended, we will need
>>>> to add some gt bits into pmu->enable. Maybe reserve top four same as
>>>> with config counters.
>>>
>>> Nope. What you have here works just fine. pmu->enable should not include
>>> any gt id info. gt_id[63:60] is only a concept for pmu config sent by
>>> user.  config_mask and pmu->enable are i915 internal bookkeeping (bit
>>> masks) just to track what events need to be sampled.  The 'other' bit
>>> masks are a function of gt_id because we use gt_id to calculate a
>>> contiguous numerical value for these 'other' events. That's all. Once the
>>> numerical value is calculated, there is no need for gt_id because
>>> config_mask is BIT_ULL(numerical_value). Since the numerical values never
>>> exceeded 31 (even for multi-gts), everything worked even with 32 bit
>>> pmu->enable.
>>
>> Yep.
>>
>> So question then is why make pmu->enable u64?
> 
> The only reason was simplicity, since a lot of the existing code already
> assumes u64.
> 
> E.g. if we keep pmu->enable u32, we should have to do the following:
> 
> * Change config_mask() return type to u32 (in frequency_sampling_enabled(),
>    we have 'pmu->enable & config_mask()')
> * Change frequency_enabled_mask() return type to u32 (again uses
>    config_mask() so if we change config_mask() to u32 we change return type
>    here too)
> * In i915_pmu_enable(), change 'pmu->enable |= BIT_ULL(bit)' to
>    'pmu->enable |= BIT(bit)'
> 
> So yes, if we think we should be using pmu->enable u32, let's change this
> to be consistent everywhere.
> 
>> Instead frequency_enabled_mask() should be made u32 since the bitwise or
>> composition of config_masks() is guaranteed to fit.
>>
>> At most it can have an internal u64 for the mask, assert upper_32_bits are
>> zero and return lower_32_bits.
>>
>> Did I get it right this time round? :)
> 
> Yes, though we'd have to make the config_mask() type change above to make
> frequency_enabled_mask() u32. Or we can just keep everything u64. Let's
> decide one way or the other and close this. It seems Tvrtko is leaning
> towards making pmu->enable and frequency_enabled_mask() u32?

I think so. Since it seems u64 for config_mask() was a mistake from the 
start lets fix it up. I can send a patch to do that if easier?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer
  2023-05-16  7:12               ` Tvrtko Ursulin
@ 2023-05-16 16:29                 ` Dixit, Ashutosh
  0 siblings, 0 replies; 45+ messages in thread
From: Dixit, Ashutosh @ 2023-05-16 16:29 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, 16 May 2023 00:12:45 -0700, Tvrtko Ursulin wrote:
>
> On 15/05/2023 22:24, Dixit, Ashutosh wrote:
> > On Mon, 15 May 2023 02:52:35 -0700, Tvrtko Ursulin wrote:
> >>
> >> On 13/05/2023 00:44, Umesh Nerlige Ramappa wrote:
> >>> On Fri, May 12, 2023 at 04:20:19PM -0700, Dixit, Ashutosh wrote:
> >>>> On Fri, 12 May 2023 15:44:00 -0700, Umesh Nerlige Ramappa wrote:
> >>>>>
> >>>>> On Fri, May 12, 2023 at 03:29:03PM -0700, Dixit, Ashutosh wrote:
> >>>>>> On Fri, 05 May 2023 17:58:14 -0700, Umesh Nerlige Ramappa wrote:
> >>>>>>>
> >>>>>>
> >>>>>> Hi Umesh/Tvrtko,
> >>>>>>
> >>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>>
> >>>>>>> We do not want to have timers per tile and waste CPU cycles and
> >>>>> energy via
> >>>>>>> multiple wake-up sources, for a relatively un-important task of PMU
> >>>>>>> sampling, so keeping a single timer works well. But we also do not
> >>>>> want
> >>>>>>> the first GT which goes idle to turn off the timer.
> >>>>>>>
> >>>>>>> Add some reference counting, via a mask of unparked GTs, to solve
> >>>>> this.
> >>>>>>>
> >>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>> Signed-off-by: Umesh Nerlige Ramappa
> >>>>> <umesh.nerlige.ramappa@intel.com>
> >>>>>>> ---
> >>>>>>>    drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
> >>>>>>>    drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
> >>>>>>>    2 files changed, 14 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
> >>>>> b/drivers/gpu/drm/i915/i915_pmu.c
> >>>>>>> index 2b63ee31e1b3..669a42e44082 100644
> >>>>>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >>>>>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >>>>>>> @@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
> >>>>>>>       * Signal sampling timer to stop if only engine events are
> >>>>> enabled and
> >>>>>>>       * GPU went idle.
> >>>>>>>       */
> >>>>>>> -    pmu->timer_enabled = pmu_needs_timer(pmu, false);
> >>>>>>> +    pmu->unparked &= ~BIT(gt->info.id);
> >>>>>>> +    if (pmu->unparked == 0)
> >>>>>>> +        pmu->timer_enabled = pmu_needs_timer(pmu, false);
> >>>>>>>
> >>>>>>>      spin_unlock_irq(&pmu->lock);
> >>>>>>>    }
> >>>>>>> @@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
> >>>>>>>      /*
> >>>>>>>       * Re-enable sampling timer when GPU goes active.
> >>>>>>>       */
> >>>>>>> -    __i915_pmu_maybe_start_timer(pmu);
> >>>>>>> +    if (pmu->unparked == 0)
> >>>>>>> +        __i915_pmu_maybe_start_timer(pmu);
> >>>>>>> +
> >>>>>>> +    pmu->unparked |= BIT(gt->info.id);
> >>>>>>>
> >>>>>>>      spin_unlock_irq(&pmu->lock);
> >>>>>>>    }
> >>>>>>> @@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct
> >>>>> hrtimer *hrtimer)
> >>>>>>>       */
> >>>>>>>
> >>>>>>>      for_each_gt(gt, i915, i) {
> >>>>>>> +        if (!(pmu->unparked & BIT(i)))
> >>>>>>> +            continue;
> >>>>>>> +
> >>>>>>
> >>>>>> This is not correct. In this series we are at least sampling
> >>>>> frequencies
> >>>>>> (calling frequency_sample) even when GT is parked. So these 3 lines
> >>>>> should be
> >>>>>> deleted. engines_sample will get called and will return without doing
> >>>>>> anything if engine events are disabled.
> >>>>>
> >>>>> Not sure I understand. This is checking pmu->'un'parked bits.
> >>>>
> >>>> Sorry, my bad. Not "engines_sample will get called and will return
> >>>> without
> >>>> doing anything if engine events are disabled" but "engines_sample will
> >>>> get
> >>>> called and will return without doing anything if GT is not awake". This
> >>>> is
> >>>> the same as the previous behavior before this series.
> >>>>
> >>>> Umesh and I discussed this but writing this out in case Tvrtko takes a
> >>>> look.
> >>>
> >>> Sounds good, Dropping the check here in the new revision.
> >
> > Hi Tvrtko,
> >
> >> I think it is safe to not have the check, but I didn't quite understand the
> >> "this is not correct" part. I can only see the argument that it could be
> >> redundant, not that it is incorrect.
> >
> > I said that it looks incorrect to me because in this series we are still
> > sampling freq when gt is parked and we would be skipping that if we
> > included:
> >		if (!(pmu->unparked & BIT(i)))
> >			continue;
>
> Ah okay. We concluded in your upstream patch that looks like an omission.
>
> >> In which case I think it should better stay since it is way more efficient,
> >> given this gets called at 200Hz, than the *atomic* atomic_inc_not_zero
> >> (from intel_wakeref_get_if_active).
> >
> > Where efficiency goes, when we merge the patch below (I have a v2 based on
> > your suggestion but I am waiting till Umesh's series gets merged):
> >
> > https://patchwork.freedesktop.org/series/117658/
> >
> > this will turn off the timer itself which will be even more
> > efficient. Rather than use the above code where the timer is running and
> > then we skip. So after the link above is merged the above code will be
> > truly redundant. That was a second reason why I said delete it.
>
> On multi-tile where not all tiles are being looked at it still pays to
> avoid the atomic check.

Ah ok I overlooked there was a single timer shared between multiple tiles.

> It doesn't apply to tools like intel_gpu_top, which
> monitor all tiles, but still I think there isn't any harm in having the
> fast check.

I think in that case the simplest would be to just put this code back (as
it was in Tvrtko's original patch) and not worry about sampling the freq's
when gt is parked, since we want to stop doing that anyway.

So let's just put this code back.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2023-05-16 16:29 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-06  0:58 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
2023-05-06  0:58 ` [Intel-gfx] [PATCH 1/6] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
2023-05-08 17:52   ` Umesh Nerlige Ramappa
2023-05-09 12:26     ` Tvrtko Ursulin
2023-05-06  0:58 ` [Intel-gfx] [PATCH 2/6] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
2023-05-08 17:53   ` Umesh Nerlige Ramappa
2023-05-06  0:58 ` [Intel-gfx] [PATCH 3/6] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
2023-05-08 17:55   ` Umesh Nerlige Ramappa
2023-05-09 15:10     ` Umesh Nerlige Ramappa
2023-05-06  0:58 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
2023-05-08 17:58   ` Umesh Nerlige Ramappa
2023-05-09 17:25   ` Dixit, Ashutosh
2023-05-10  6:02     ` Dixit, Ashutosh
2023-05-12 22:29   ` Dixit, Ashutosh
2023-05-12 22:44     ` Umesh Nerlige Ramappa
2023-05-12 23:20       ` Dixit, Ashutosh
2023-05-12 23:44         ` Umesh Nerlige Ramappa
2023-05-15  9:52           ` Tvrtko Ursulin
2023-05-15 21:24             ` Dixit, Ashutosh
2023-05-16  7:12               ` Tvrtko Ursulin
2023-05-16 16:29                 ` Dixit, Ashutosh
2023-05-06  0:58 ` [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
2023-05-08 18:07   ` Umesh Nerlige Ramappa
2023-05-12  1:08   ` Dixit, Ashutosh
2023-05-12 10:56     ` Tvrtko Ursulin
2023-05-12 20:57       ` Umesh Nerlige Ramappa
2023-05-12 22:37         ` Umesh Nerlige Ramappa
2023-05-13  1:09         ` Dixit, Ashutosh
2023-05-15 10:10         ` Tvrtko Ursulin
2023-05-15 22:07           ` Dixit, Ashutosh
2023-05-16  8:35             ` Tvrtko Ursulin
2023-05-06  0:58 ` [Intel-gfx] [PATCH 6/6] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
2023-05-08 18:08   ` Umesh Nerlige Ramappa
2023-05-09 12:38   ` Tvrtko Ursulin
2023-05-11 18:57   ` Dixit, Ashutosh
2023-05-12 10:57     ` Tvrtko Ursulin
2023-05-12 17:08       ` Dixit, Ashutosh
2023-05-12 18:53         ` Umesh Nerlige Ramappa
2023-05-12 20:10           ` Dixit, Ashutosh
2023-05-06  2:20 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt (rev2) Patchwork
2023-05-06  2:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-05-06  2:38 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2023-05-13  1:55 [Intel-gfx] [PATCH 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
2023-05-13  1:55 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
2023-05-13  3:01   ` Dixit, Ashutosh
2023-05-15  6:44 [Intel-gfx] [PATCH v4 0/6] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
2023-05-15  6:44 ` [Intel-gfx] [PATCH 4/6] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox