public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
@ 2016-04-04 16:51 Tvrtko Ursulin
  2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-04 16:51 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Current implementation releases the forcewake at any time between
straight away, and one jiffie from the last put, or first automatic
grab.

This does not sound like what was desired since jiffies are typically
between 1 and 10ms depending on the kernel configuration.

Change the auto-release mechanism to use hrtimers and set the timeout
to 1ms with a 1ms of slack. This should make the GPU power consistent
across kernel configs and the slack should enable some timer coallescing
where multiple force-wake domains exist, or with unrelated timers.

For GlBench/T-Rex this decreases the number of forcewake releases from
~480 to ~300 per second, and for a heavy combined OGL/OCL test from
~670 to ~360.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h     |  2 +-
 drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++---------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd187727c813..7d4c704d7d75 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -666,7 +666,7 @@ struct intel_uncore {
 		struct drm_i915_private *i915;
 		enum forcewake_domain_id id;
 		unsigned wake_count;
-		struct timer_list timer;
+		struct hrtimer timer;
 		i915_reg_t reg_set;
 		u32 val_set;
 		u32 val_clear;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index ac1c545436af..76ac990de354 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -60,7 +60,11 @@ fw_domain_reset(const struct intel_uncore_forcewake_domain *d)
 static inline void
 fw_domain_arm_timer(struct intel_uncore_forcewake_domain *d)
 {
-	mod_timer_pinned(&d->timer, jiffies + 1);
+	d->wake_count++;
+	hrtimer_start_range_ns(&d->timer,
+			       ktime_set(0, NSEC_PER_MSEC),
+			       NSEC_PER_MSEC,
+			       HRTIMER_MODE_REL);
 }
 
 static inline void
@@ -224,9 +228,11 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
-static void intel_uncore_fw_release_timer(unsigned long arg)
+static enum hrtimer_restart
+intel_uncore_fw_release_timer(struct hrtimer *timer)
 {
-	struct intel_uncore_forcewake_domain *domain = (void *)arg;
+	struct intel_uncore_forcewake_domain *domain =
+	       container_of(timer, struct intel_uncore_forcewake_domain, timer);
 	unsigned long irqflags;
 
 	assert_rpm_device_not_suspended(domain->i915);
@@ -240,6 +246,8 @@ static void intel_uncore_fw_release_timer(unsigned long arg)
 							  1 << domain->id);
 
 	spin_unlock_irqrestore(&domain->i915->uncore.lock, irqflags);
+
+	return HRTIMER_NORESTART;
 }
 
 void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
@@ -259,16 +267,16 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 		active_domains = 0;
 
 		for_each_fw_domain(domain, dev_priv, id) {
-			if (del_timer_sync(&domain->timer) == 0)
+			if (hrtimer_cancel(&domain->timer) == 0)
 				continue;
 
-			intel_uncore_fw_release_timer((unsigned long)domain);
+			intel_uncore_fw_release_timer(&domain->timer);
 		}
 
 		spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
 		for_each_fw_domain(domain, dev_priv, id) {
-			if (timer_pending(&domain->timer))
+			if (hrtimer_active(&domain->timer))
 				active_domains |= (1 << id);
 		}
 
@@ -491,7 +499,6 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 		if (--domain->wake_count)
 			continue;
 
-		domain->wake_count++;
 		fw_domain_arm_timer(domain);
 	}
 }
@@ -732,7 +739,6 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
 			continue;
 		}
 
-		domain->wake_count++;
 		fw_domain_arm_timer(domain);
 	}
 
@@ -1150,7 +1156,8 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
 	d->i915 = dev_priv;
 	d->id = domain_id;
 
-	setup_timer(&d->timer, intel_uncore_fw_release_timer, (unsigned long)d);
+	hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	d->timer.function = intel_uncore_fw_release_timer;
 
 	dev_priv->uncore.fw_domains |= (1 << domain_id);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators
  2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
@ 2016-04-04 16:51 ` Tvrtko Ursulin
  2016-04-04 19:00   ` Chris Wilson
  2016-04-04 19:14   ` Dave Gordon
  2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-04 16:51 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As the vast majority of users does not use the domain id variable,
we can eliminate it from the iterator and also change the latter
using the same principle as was recently done for for_each_engine.

For a couple of callers which do need the domain id, or the domain
mask, store the later in the domain itself at init time and use
both through it.

Result is clearer code and smaller generated binary, especially
in the tight fw get/put loops. Also, relationship between domain
id and mask is no longer assumed in the macro.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  5 ++---
 drivers/gpu/drm/i915/i915_drv.h     | 17 +++++++-------
 drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++++++++--------------------
 3 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a2e3af02c292..06fce014d0b4 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_uncore_forcewake_domain *fw_domain;
-	int i;
 
 	spin_lock_irq(&dev_priv->uncore.lock);
-	for_each_fw_domain(fw_domain, dev_priv, i) {
+	for_each_fw_domain(fw_domain, dev_priv) {
 		seq_printf(m, "%s.wake_count = %u\n",
-			   intel_uncore_forcewake_domain_to_str(i),
+			   intel_uncore_forcewake_domain_to_str(fw_domain->id),
 			   fw_domain->wake_count);
 	}
 	spin_unlock_irq(&dev_priv->uncore.lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d4c704d7d75..160f980f0368 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -665,6 +665,7 @@ struct intel_uncore {
 	struct intel_uncore_forcewake_domain {
 		struct drm_i915_private *i915;
 		enum forcewake_domain_id id;
+		enum forcewake_domains mask;
 		unsigned wake_count;
 		struct hrtimer timer;
 		i915_reg_t reg_set;
@@ -679,14 +680,14 @@ struct intel_uncore {
 };
 
 /* Iterate over initialised fw domains */
-#define for_each_fw_domain_mask(domain__, mask__, dev_priv__, i__) \
-	for ((i__) = 0, (domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
-	     (i__) < FW_DOMAIN_ID_COUNT; \
-	     (i__)++, (domain__) = &(dev_priv__)->uncore.fw_domain[i__]) \
-		for_each_if (((mask__) & (dev_priv__)->uncore.fw_domains) & (1 << (i__)))
-
-#define for_each_fw_domain(domain__, dev_priv__, i__) \
-	for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__, i__)
+#define for_each_fw_domain_mask(domain__, mask__, dev_priv__) \
+	for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
+	     (domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \
+	     (domain__)++) \
+		for_each_if ((mask__) & (domain__)->mask)
+
+#define for_each_fw_domain(domain__, dev_priv__) \
+	for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__)
 
 #define CSR_VERSION(major, minor)	((major) << 16 | (minor))
 #define CSR_VERSION_MAJOR(version)	((version) >> 16)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 76ac990de354..49edd641b434 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -111,9 +111,8 @@ static void
 fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
-	for_each_fw_domain_mask(d, fw_domains, dev_priv, id) {
+	for_each_fw_domain_mask(d, fw_domains, dev_priv) {
 		fw_domain_wait_ack_clear(d);
 		fw_domain_get(d);
 		fw_domain_wait_ack(d);
@@ -124,9 +123,8 @@ static void
 fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
-	for_each_fw_domain_mask(d, fw_domains, dev_priv, id) {
+	for_each_fw_domain_mask(d, fw_domains, dev_priv) {
 		fw_domain_put(d);
 		fw_domain_posting_read(d);
 	}
@@ -136,10 +134,9 @@ static void
 fw_domains_posting_read(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
 	/* No need to do for all, just do for first found */
-	for_each_fw_domain(d, dev_priv, id) {
+	for_each_fw_domain(d, dev_priv) {
 		fw_domain_posting_read(d);
 		break;
 	}
@@ -149,12 +146,11 @@ static void
 fw_domains_reset(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
 	if (dev_priv->uncore.fw_domains == 0)
 		return;
 
-	for_each_fw_domain_mask(d, fw_domains, dev_priv, id)
+	for_each_fw_domain_mask(d, fw_domains, dev_priv)
 		fw_domain_reset(d);
 
 	fw_domains_posting_read(dev_priv);
@@ -256,7 +252,6 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 	unsigned long irqflags;
 	struct intel_uncore_forcewake_domain *domain;
 	int retry_count = 100;
-	enum forcewake_domain_id id;
 	enum forcewake_domains fw = 0, active_domains;
 
 	/* Hold uncore.lock across reset to prevent any register access
@@ -266,7 +261,7 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 	while (1) {
 		active_domains = 0;
 
-		for_each_fw_domain(domain, dev_priv, id) {
+		for_each_fw_domain(domain, dev_priv) {
 			if (hrtimer_cancel(&domain->timer) == 0)
 				continue;
 
@@ -275,9 +270,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 
 		spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
-		for_each_fw_domain(domain, dev_priv, id) {
+		for_each_fw_domain(domain, dev_priv) {
 			if (hrtimer_active(&domain->timer))
-				active_domains |= (1 << id);
+				active_domains |= domain->mask;
 		}
 
 		if (active_domains == 0)
@@ -294,9 +289,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 
 	WARN_ON(active_domains);
 
-	for_each_fw_domain(domain, dev_priv, id)
+	for_each_fw_domain(domain, dev_priv)
 		if (domain->wake_count)
-			fw |= 1 << id;
+			fw |= domain->mask;
 
 	if (fw)
 		dev_priv->uncore.funcs.force_wake_put(dev_priv, fw);
@@ -418,16 +413,15 @@ static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
 					 enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
 	fw_domains &= dev_priv->uncore.fw_domains;
 
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+	for_each_fw_domain_mask(domain, fw_domains, dev_priv) {
 		if (domain->wake_count++)
-			fw_domains &= ~(1 << id);
+			fw_domains &= ~domain->mask;
 	}
 
 	if (fw_domains)
@@ -485,14 +479,13 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 					 enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_put)
 		return;
 
 	fw_domains &= dev_priv->uncore.fw_domains;
 
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+	for_each_fw_domain_mask(domain, fw_domains, dev_priv) {
 		if (WARN_ON(domain->wake_count == 0))
 			continue;
 
@@ -546,12 +539,11 @@ void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv,
 void assert_forcewakes_inactive(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
-	for_each_fw_domain(domain, dev_priv, id)
+	for_each_fw_domain(domain, dev_priv)
 		WARN_ON(domain->wake_count);
 }
 
@@ -727,15 +719,14 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
 				     enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (WARN_ON(!fw_domains))
 		return;
 
 	/* Ideally GCC would be constant-fold and eliminate this loop */
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+	for_each_fw_domain_mask(domain, fw_domains, dev_priv) {
 		if (domain->wake_count) {
-			fw_domains &= ~(1 << id);
+			fw_domains &= ~domain->mask;
 			continue;
 		}
 
@@ -1156,6 +1147,12 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
 	d->i915 = dev_priv;
 	d->id = domain_id;
 
+	BUILD_BUG_ON(FORCEWAKE_RENDER != (1 << FW_DOMAIN_ID_RENDER));
+	BUILD_BUG_ON(FORCEWAKE_BLITTER != (1 << FW_DOMAIN_ID_BLITTER));
+	BUILD_BUG_ON(FORCEWAKE_MEDIA != (1 << FW_DOMAIN_ID_MEDIA));
+
+	d->mask = 1 << domain_id;
+
 	hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	d->timer.function = intel_uncore_fw_release_timer;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains
  2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
  2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
@ 2016-04-04 16:51 ` Tvrtko Ursulin
  2016-04-04 19:07   ` Chris Wilson
  2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-04 16:51 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

On platforms with multiple forcewake domains it seems more efficient
to request all desired ones and then to wait for acks to avoid
needlessly serializing on each domain.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 49edd641b434..03674c3cfaf7 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -115,8 +115,10 @@ fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
 	for_each_fw_domain_mask(d, fw_domains, dev_priv) {
 		fw_domain_wait_ack_clear(d);
 		fw_domain_get(d);
-		fw_domain_wait_ack(d);
 	}
+
+	for_each_fw_domain_mask(d, fw_domains, dev_priv)
+		fw_domain_wait_ack(d);
 }
 
 static void
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
  2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
  2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
@ 2016-04-04 18:58 ` Chris Wilson
  2016-04-04 19:41   ` Dave Gordon
  2016-04-05  8:54   ` Tvrtko Ursulin
  2016-04-04 18:58 ` Dave Gordon
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-04 18:58 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Current implementation releases the forcewake at any time between
> straight away, and one jiffie from the last put, or first automatic
> grab.

That isn't the problem though. The problem is that we set the timer on
first use rather than last use. All you are stating here is that by
lengthening the timeout on your system you reduce the number of times it
times out.

Having said that, the conversion to hrtimer seems sensible but to add
tracking of the last access as opposed to first we either fallback to
jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
fast enough for every register write. Hmm, my usual response to that has
been if it matters we avoid the heavyweight macros and use the _FW
interface - or even raw readl/writel.

Could you try storing ktime_get_raw() on every access and rearming the
timer if it expires before last-access + min-period?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
@ 2016-04-04 18:58 ` Dave Gordon
  2016-04-05  6:26 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] " Patchwork
  2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin
  5 siblings, 0 replies; 23+ messages in thread
From: Dave Gordon @ 2016-04-04 18:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

On 04/04/16 17:51, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Current implementation releases the forcewake at any time between
> straight away, and one jiffie from the last put, or first automatic
> grab.
>
> This does not sound like what was desired since jiffies are typically
> between 1 and 10ms depending on the kernel configuration.
>
> Change the auto-release mechanism to use hrtimers and set the timeout
> to 1ms with a 1ms of slack. This should make the GPU power consistent
> across kernel configs and the slack should enable some timer coallescing

"coalescing"

> where multiple force-wake domains exist, or with unrelated timers.
>
> For GlBench/T-Rex this decreases the number of forcewake releases from
> ~480 to ~300 per second, and for a heavy combined OGL/OCL test from
> ~670 to ~360.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LGTM.

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_drv.h     |  2 +-
>   drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++---------
>   2 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index dd187727c813..7d4c704d7d75 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -666,7 +666,7 @@ struct intel_uncore {
>   		struct drm_i915_private *i915;
>   		enum forcewake_domain_id id;
>   		unsigned wake_count;
> -		struct timer_list timer;
> +		struct hrtimer timer;
>   		i915_reg_t reg_set;
>   		u32 val_set;
>   		u32 val_clear;
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index ac1c545436af..76ac990de354 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -60,7 +60,11 @@ fw_domain_reset(const struct intel_uncore_forcewake_domain *d)
>   static inline void
>   fw_domain_arm_timer(struct intel_uncore_forcewake_domain *d)
>   {
> -	mod_timer_pinned(&d->timer, jiffies + 1);
> +	d->wake_count++;
> +	hrtimer_start_range_ns(&d->timer,
> +			       ktime_set(0, NSEC_PER_MSEC),
> +			       NSEC_PER_MSEC,
> +			       HRTIMER_MODE_REL);
>   }
>
>   static inline void
> @@ -224,9 +228,11 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
>   	return ret;
>   }
>
> -static void intel_uncore_fw_release_timer(unsigned long arg)
> +static enum hrtimer_restart
> +intel_uncore_fw_release_timer(struct hrtimer *timer)
>   {
> -	struct intel_uncore_forcewake_domain *domain = (void *)arg;
> +	struct intel_uncore_forcewake_domain *domain =
> +	       container_of(timer, struct intel_uncore_forcewake_domain, timer);
>   	unsigned long irqflags;
>
>   	assert_rpm_device_not_suspended(domain->i915);
> @@ -240,6 +246,8 @@ static void intel_uncore_fw_release_timer(unsigned long arg)
>   							  1 << domain->id);
>
>   	spin_unlock_irqrestore(&domain->i915->uncore.lock, irqflags);
> +
> +	return HRTIMER_NORESTART;
>   }
>
>   void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
> @@ -259,16 +267,16 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
>   		active_domains = 0;
>
>   		for_each_fw_domain(domain, dev_priv, id) {
> -			if (del_timer_sync(&domain->timer) == 0)
> +			if (hrtimer_cancel(&domain->timer) == 0)
>   				continue;
>
> -			intel_uncore_fw_release_timer((unsigned long)domain);
> +			intel_uncore_fw_release_timer(&domain->timer);
>   		}
>
>   		spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
>
>   		for_each_fw_domain(domain, dev_priv, id) {
> -			if (timer_pending(&domain->timer))
> +			if (hrtimer_active(&domain->timer))
>   				active_domains |= (1 << id);
>   		}
>
> @@ -491,7 +499,6 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
>   		if (--domain->wake_count)
>   			continue;
>
> -		domain->wake_count++;
>   		fw_domain_arm_timer(domain);
>   	}
>   }
> @@ -732,7 +739,6 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
>   			continue;
>   		}
>
> -		domain->wake_count++;
>   		fw_domain_arm_timer(domain);
>   	}
>
> @@ -1150,7 +1156,8 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
>   	d->i915 = dev_priv;
>   	d->id = domain_id;
>
> -	setup_timer(&d->timer, intel_uncore_fw_release_timer, (unsigned long)d);
> +	hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	d->timer.function = intel_uncore_fw_release_timer;
>
>   	dev_priv->uncore.fw_domains |= (1 << domain_id);
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators
  2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
@ 2016-04-04 19:00   ` Chris Wilson
  2016-04-04 19:14   ` Dave Gordon
  1 sibling, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-04 19:00 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Apr 04, 2016 at 05:51:10PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> As the vast majority of users does not use the domain id variable,
> we can eliminate it from the iterator and also change the latter
> using the same principle as was recently done for for_each_engine.
> 
> For a couple of callers which do need the domain id, or the domain
> mask, store the later in the domain itself at init time and use
> both through it.
> 
> Result is clearer code and smaller generated binary, especially
> in the tight fw get/put loops. Also, relationship between domain
> id and mask is no longer assumed in the macro.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Hah, I made exactly (admittedly I didn't think of adding BUILD_BUG_ON)
the same patch just as you went off on holiday and weren't around to
comment!

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains
  2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
@ 2016-04-04 19:07   ` Chris Wilson
  2016-04-05  9:02     ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-04-04 19:07 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> On platforms with multiple forcewake domains it seems more efficient
> to request all desired ones and then to wait for acks to avoid
> needlessly serializing on each domain.

Not convinced since we have more machines with one domain than two. What
I did was to compact the domains array so that we only iterated over the
known set - but that feels overkill when we only have two domains today.

For the same reason (only one machine with two domains), I didn't think
seperate functions to iterate over one domain and another to iterate
over all was worth it.

What you can do though is remove an excess posting read from
fw_domains_put.

Compared to the cost of a register access (the spinlock irq mostly) the
iterator doesn't strike me as being that worthwhile an optimisation
target.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators
  2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
  2016-04-04 19:00   ` Chris Wilson
@ 2016-04-04 19:14   ` Dave Gordon
  2016-04-05  9:05     ` Tvrtko Ursulin
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Gordon @ 2016-04-04 19:14 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

On 04/04/16 17:51, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> As the vast majority of users does not use the domain id variable,

"do not use"

> we can eliminate it from the iterator and also change the latter
> using the same principle as was recently done for for_each_engine.
>
> For a couple of callers which do need the domain id, or the domain
> mask, store the later in the domain itself at init time and use
> both through it.

"For a couple of callers which do need the domain mask, store it
in the domain array (which already has the domain id), then both
can be retrieved thence." ?

> Result is clearer code and smaller generated binary, especially
> in the tight fw get/put loops. Also, relationship between domain
> id and mask is no longer assumed in the macro.

"in the macro or elsewhere" ?

> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few typos & clarifications above, plus a minor quibble about a macro
name (which you haven't actually changed, but might as well fix).
Otherwise LGTM, so

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

(even if you don't change the macro name)

> ---
>   drivers/gpu/drm/i915/i915_debugfs.c |  5 ++---
>   drivers/gpu/drm/i915/i915_drv.h     | 17 +++++++-------
>   drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++++++++--------------------
>   3 files changed, 32 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index a2e3af02c292..06fce014d0b4 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct seq_file *m, void *data)
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_uncore_forcewake_domain *fw_domain;
> -	int i;
>
>   	spin_lock_irq(&dev_priv->uncore.lock);
> -	for_each_fw_domain(fw_domain, dev_priv, i) {
> +	for_each_fw_domain(fw_domain, dev_priv) {
>   		seq_printf(m, "%s.wake_count = %u\n",
> -			   intel_uncore_forcewake_domain_to_str(i),
> +			   intel_uncore_forcewake_domain_to_str(fw_domain->id),
>   			   fw_domain->wake_count);
>   	}
>   	spin_unlock_irq(&dev_priv->uncore.lock);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7d4c704d7d75..160f980f0368 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -665,6 +665,7 @@ struct intel_uncore {
>   	struct intel_uncore_forcewake_domain {
>   		struct drm_i915_private *i915;
>   		enum forcewake_domain_id id;
> +		enum forcewake_domains mask;

At present this mask will always have only one bit set, but I suppose 
there might be some utility in allowing multiple bits (virtual domains?)

>   		unsigned wake_count;
>   		struct hrtimer timer;
>   		i915_reg_t reg_set;
> @@ -679,14 +680,14 @@ struct intel_uncore {
>   };
>
>   /* Iterate over initialised fw domains */
> -#define for_each_fw_domain_mask(domain__, mask__, dev_priv__, i__) \
> -	for ((i__) = 0, (domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
> -	     (i__) < FW_DOMAIN_ID_COUNT; \
> -	     (i__)++, (domain__) = &(dev_priv__)->uncore.fw_domain[i__]) \
> -		for_each_if (((mask__) & (dev_priv__)->uncore.fw_domains) & (1 << (i__)))
> -
> -#define for_each_fw_domain(domain__, dev_priv__, i__) \
> -	for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__, i__)
> +#define for_each_fw_domain_mask(domain__, mask__, dev_priv__) \
> +	for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
> +	     (domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \
> +	     (domain__)++) \
> +		for_each_if ((mask__) & (domain__)->mask)
> +

For consistency with the for_each_engine() macros, the name ought to be 
"for_each_fw_domain_mask*ed*()", showing that we're iterating over a 
subset selected by the 'mask' parameter, rather than iterating over all 
possible masks.

.Dave.

> +#define for_each_fw_domain(domain__, dev_priv__) \
> +	for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__)
>
>   #define CSR_VERSION(major, minor)	((major) << 16 | (minor))
>   #define CSR_VERSION_MAJOR(version)	((version) >> 16)
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 76ac990de354..49edd641b434 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -111,9 +111,8 @@ static void
>   fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
>   {
>   	struct intel_uncore_forcewake_domain *d;
> -	enum forcewake_domain_id id;
>
> -	for_each_fw_domain_mask(d, fw_domains, dev_priv, id) {
> +	for_each_fw_domain_mask(d, fw_domains, dev_priv) {
>   		fw_domain_wait_ack_clear(d);
>   		fw_domain_get(d);
>   		fw_domain_wait_ack(d);
> @@ -124,9 +123,8 @@ static void
>   fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
>   {
>   	struct intel_uncore_forcewake_domain *d;
> -	enum forcewake_domain_id id;
>
> -	for_each_fw_domain_mask(d, fw_domains, dev_priv, id) {
> +	for_each_fw_domain_mask(d, fw_domains, dev_priv) {
>   		fw_domain_put(d);
>   		fw_domain_posting_read(d);
>   	}
> @@ -136,10 +134,9 @@ static void
>   fw_domains_posting_read(struct drm_i915_private *dev_priv)
>   {
>   	struct intel_uncore_forcewake_domain *d;
> -	enum forcewake_domain_id id;
>
>   	/* No need to do for all, just do for first found */
> -	for_each_fw_domain(d, dev_priv, id) {
> +	for_each_fw_domain(d, dev_priv) {
>   		fw_domain_posting_read(d);
>   		break;
>   	}
> @@ -149,12 +146,11 @@ static void
>   fw_domains_reset(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
>   {
>   	struct intel_uncore_forcewake_domain *d;
> -	enum forcewake_domain_id id;
>
>   	if (dev_priv->uncore.fw_domains == 0)
>   		return;
>
> -	for_each_fw_domain_mask(d, fw_domains, dev_priv, id)
> +	for_each_fw_domain_mask(d, fw_domains, dev_priv)
>   		fw_domain_reset(d);
>
>   	fw_domains_posting_read(dev_priv);
> @@ -256,7 +252,6 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
>   	unsigned long irqflags;
>   	struct intel_uncore_forcewake_domain *domain;
>   	int retry_count = 100;
> -	enum forcewake_domain_id id;
>   	enum forcewake_domains fw = 0, active_domains;
>
>   	/* Hold uncore.lock across reset to prevent any register access
> @@ -266,7 +261,7 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
>   	while (1) {
>   		active_domains = 0;
>
> -		for_each_fw_domain(domain, dev_priv, id) {
> +		for_each_fw_domain(domain, dev_priv) {
>   			if (hrtimer_cancel(&domain->timer) == 0)
>   				continue;
>
> @@ -275,9 +270,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
>
>   		spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
>
> -		for_each_fw_domain(domain, dev_priv, id) {
> +		for_each_fw_domain(domain, dev_priv) {
>   			if (hrtimer_active(&domain->timer))
> -				active_domains |= (1 << id);
> +				active_domains |= domain->mask;
>   		}
>
>   		if (active_domains == 0)
> @@ -294,9 +289,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
>
>   	WARN_ON(active_domains);
>
> -	for_each_fw_domain(domain, dev_priv, id)
> +	for_each_fw_domain(domain, dev_priv)
>   		if (domain->wake_count)
> -			fw |= 1 << id;
> +			fw |= domain->mask;
>
>   	if (fw)
>   		dev_priv->uncore.funcs.force_wake_put(dev_priv, fw);
> @@ -418,16 +413,15 @@ static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
>   					 enum forcewake_domains fw_domains)
>   {
>   	struct intel_uncore_forcewake_domain *domain;
> -	enum forcewake_domain_id id;
>
>   	if (!dev_priv->uncore.funcs.force_wake_get)
>   		return;
>
>   	fw_domains &= dev_priv->uncore.fw_domains;
>
> -	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
> +	for_each_fw_domain_mask(domain, fw_domains, dev_priv) {
>   		if (domain->wake_count++)
> -			fw_domains &= ~(1 << id);
> +			fw_domains &= ~domain->mask;
>   	}
>
>   	if (fw_domains)
> @@ -485,14 +479,13 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
>   					 enum forcewake_domains fw_domains)
>   {
>   	struct intel_uncore_forcewake_domain *domain;
> -	enum forcewake_domain_id id;
>
>   	if (!dev_priv->uncore.funcs.force_wake_put)
>   		return;
>
>   	fw_domains &= dev_priv->uncore.fw_domains;
>
> -	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
> +	for_each_fw_domain_mask(domain, fw_domains, dev_priv) {
>   		if (WARN_ON(domain->wake_count == 0))
>   			continue;
>
> @@ -546,12 +539,11 @@ void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv,
>   void assert_forcewakes_inactive(struct drm_i915_private *dev_priv)
>   {
>   	struct intel_uncore_forcewake_domain *domain;
> -	enum forcewake_domain_id id;
>
>   	if (!dev_priv->uncore.funcs.force_wake_get)
>   		return;
>
> -	for_each_fw_domain(domain, dev_priv, id)
> +	for_each_fw_domain(domain, dev_priv)
>   		WARN_ON(domain->wake_count);
>   }
>
> @@ -727,15 +719,14 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
>   				     enum forcewake_domains fw_domains)
>   {
>   	struct intel_uncore_forcewake_domain *domain;
> -	enum forcewake_domain_id id;
>
>   	if (WARN_ON(!fw_domains))
>   		return;
>
>   	/* Ideally GCC would be constant-fold and eliminate this loop */
> -	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
> +	for_each_fw_domain_mask(domain, fw_domains, dev_priv) {
>   		if (domain->wake_count) {
> -			fw_domains &= ~(1 << id);
> +			fw_domains &= ~domain->mask;
>   			continue;
>   		}
>
> @@ -1156,6 +1147,12 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
>   	d->i915 = dev_priv;
>   	d->id = domain_id;
>
> +	BUILD_BUG_ON(FORCEWAKE_RENDER != (1 << FW_DOMAIN_ID_RENDER));
> +	BUILD_BUG_ON(FORCEWAKE_BLITTER != (1 << FW_DOMAIN_ID_BLITTER));
> +	BUILD_BUG_ON(FORCEWAKE_MEDIA != (1 << FW_DOMAIN_ID_MEDIA));
> +
> +	d->mask = 1 << domain_id;
> +
>   	hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>   	d->timer.function = intel_uncore_fw_release_timer;
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
@ 2016-04-04 19:41   ` Dave Gordon
  2016-04-04 20:22     ` Chris Wilson
  2016-04-05  8:54   ` Tvrtko Ursulin
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Gordon @ 2016-04-04 19:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On 04/04/16 19:58, Chris Wilson wrote:
> On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Current implementation releases the forcewake at any time between
>> straight away, and one jiffie from the last put, or first automatic
>> grab.
>
> That isn't the problem though. The problem is that we set the timer on
> first use rather than last use. All you are stating here is that by
> lengthening the timeout on your system you reduce the number of times it
> times out.

I thought that was already done, per my comments on one of your recent 
patches (rename __force_wake_get to __force_wake_auto) and your reply 
(gains are probably worth extra complexity). But of course Tvrtko wasn't 
here the last few days so may not have seen that. I suppose it would 
make sense to fold both changes together.

But changing the timeout to 1ms makes it generally *shorter* than 
before, not longer. The gain can't just be from having a longer timeout, 
'cos it isn't. (Average timeout now 1.5ms vs. previous 8ms?)

I think it should come from the fact that accesses are likely to be 
bursty, having a bimodal distribution in the time domain. When we've 
made one access, we're likely to make another much sooner than 1ms 
later, but once we stop making accesses there will probably be far more 
than 1ms before the next set of accesses.

> Having said that, the conversion to hrtimer seems sensible but to add
> tracking of the last access as opposed to first we either fallback to
> jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
> fast enough for every register write. Hmm, my usual response to that has
> been if it matters we avoid the heavyweight macros and use the _FW
> interface - or even raw readl/writel.
>
> Could you try storing ktime_get_raw() on every access and rearming the
> timer if it expires before last-access + min-period?
> -Chris

One more idea - do we want two different expiry periods depending on 
whether the caller used the _auto version or not?

If they did, they're probably anticipating "only a few" accesses in the 
current operation, but perhaps with no idea about when the next set of 
accesses will occur.

If not, then the final decrement means they think they've finished for a 
while, and perhaps don't expect to come back for quite a long time. In 
that situation the timeout only helps avoid rapid off/on cycles by 
unrelated functions, not between logically-related operations.

So would two different timeouts make sense? Or even ...

_auto: longish timeout from the *first* _get() - don't rearm thereafter
other: shortish timeout from the _put() - non-auto _get() will cancel

or is that just too complicated?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 19:41   ` Dave Gordon
@ 2016-04-04 20:22     ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-04 20:22 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Apr 04, 2016 at 08:41:20PM +0100, Dave Gordon wrote:
> On 04/04/16 19:58, Chris Wilson wrote:
> >On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Current implementation releases the forcewake at any time between
> >>straight away, and one jiffie from the last put, or first automatic
> >>grab.
> >
> >That isn't the problem though. The problem is that we set the timer on
> >first use rather than last use. All you are stating here is that by
> >lengthening the timeout on your system you reduce the number of times it
> >times out.
> 
> I thought that was already done, per my comments on one of your
> recent patches (rename __force_wake_get to __force_wake_auto) and
> your reply (gains are probably worth extra complexity). But of
> course Tvrtko wasn't here the last few days so may not have seen
> that. I suppose it would make sense to fold both changes together.

We were talking about the cost of having 2 passes vs 1 to do the same
task. Before Tvrtko left, I realised again that we didn't defer the
timer itself for every access, rearming a timer on every register access
seems costly (more so switching to hrtimer). Rearming the timer inside
the timer callback seems a reasonable compromise.
 
> But changing the timeout to 1ms makes it generally *shorter* than
> before, not longer. The gain can't just be from having a longer
> timeout, 'cos it isn't. (Average timeout now 1.5ms vs. previous
> 8ms?)

It is. jiffies+1 does not equal an 8ms delay (it would be 1ms on his
machine anways, 10ms on mine). The issue is that +1 jiffie is up to 1
jiffie and often close to 0, and that very well explains the wakeups
Tvrtko measured.

> I think it should come from the fact that accesses are likely to be
> bursty, having a bimodal distribution in the time domain. When we've
> made one access, we're likely to make another much sooner than 1ms
> later, but once we stop making accesses there will probably be far
> more than 1ms before the next set of accesses.

Yes. We expect temporal coherence in the access patten. That's assuredly
so since we very rarely access one register by itself.

> One more idea - do we want two different expiry periods depending on
> whether the caller used the _auto version or not?

Hard to tell. Broadly speaking we have two users of the non-auto
interface: userspace disabling forcewake indefinitely, and the other is
code that wants to hold the spinlock + forcewake over a bunch of
register writes.
 
> If they did, they're probably anticipating "only a few" accesses in
> the current operation, but perhaps with no idea about when the next
> set of accesses will occur.
> 
> If not, then the final decrement means they think they've finished
> for a while, and perhaps don't expect to come back for quite a long
> time. In that situation the timeout only helps avoid rapid off/on
> cycles by unrelated functions, not between logically-related
> operations.

Execlists. (Wants to hold forcewake over a bunch of reads and writes,
and may be very frequent context switches or very slow)
 
> So would two different timeouts make sense? Or even ...
> 
> _auto: longish timeout from the *first* _get() - don't rearm thereafter
> other: shortish timeout from the _put() - non-auto _get() will cancel
> 
> or is that just too complicated?

The challenge is that I think we should be pushing for shorter timeouts,
because of the bimodality in many accesses followed by a long period of
sleep. One of the original reasons for the choice in 1 jiffie with a
pinned timer was that it should mean that on the next call to schedule()
we should be deasserting forcewake (i.e. we only hold forcewake for this
sequence of register accesses). However, execlists scuppers that by
having an interrupt context that requires forcewake, and frequently so.

So I think we do need a combination of something like task_work_run() to
tie a forcewake reference to the active task with something like an
extra timer to allow frequent wakers to avoid having to retake
forcewake.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2016-04-04 18:58 ` Dave Gordon
@ 2016-04-05  6:26 ` Patchwork
  2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin
  5 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2016-04-05  6:26 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
URL   : https://patchwork.freedesktop.org/series/5282/
State : failure

== Summary ==

Series 5282v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/5282/revisions/1/mbox/

Test drv_hangman:
        Subgroup error-state-basic:
                pass       -> FAIL       (ilk-hp8440p)
Test kms_force_connector_basic:
        Subgroup prune-stale-modes:
                skip       -> PASS       (ilk-hp8440p)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (hsw-gt2)

bdw-nuci7        total:196  pass:184  dwarn:0   dfail:0   fail:0   skip:12 
bdw-ultra        total:196  pass:175  dwarn:0   dfail:0   fail:0   skip:21 
bsw-nuc-2        total:196  pass:159  dwarn:0   dfail:0   fail:0   skip:37 
byt-nuc          total:196  pass:161  dwarn:0   dfail:0   fail:0   skip:35 
hsw-brixbox      total:196  pass:174  dwarn:0   dfail:0   fail:0   skip:22 
hsw-gt2          total:196  pass:179  dwarn:0   dfail:0   fail:0   skip:17 
ilk-hp8440p      total:196  pass:131  dwarn:0   dfail:0   fail:1   skip:64 
ivb-t430s        total:196  pass:171  dwarn:0   dfail:0   fail:0   skip:25 
skl-i7k-2        total:196  pass:173  dwarn:0   dfail:0   fail:0   skip:23 
snb-dellxps      total:196  pass:162  dwarn:0   dfail:0   fail:0   skip:34 
snb-x220t        total:196  pass:162  dwarn:0   dfail:0   fail:1   skip:33 

Results at /archive/results/CI_IGT_test/Patchwork_1793/

aedfaaef290af9c8df7d9f4adf22cbe21704d091 drm-intel-nightly: 2016y-04m-04d-13h-09m-54s UTC integration manifest
51638b310bc8dcb6536bbb105efaa8c03eb54463 drm/i915: Do not serialize forcewake acquire across domains
5b17ed47669e2d3be296c473240e57db3d27adea drm/i915: Simplify for_each_fw_domain iterators
a3e632b7f4a64695a3601c8b0ef4fbade564507c drm/i915: Use consistent forcewake auto-release timeout across kernel configs

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
  2016-04-04 19:41   ` Dave Gordon
@ 2016-04-05  8:54   ` Tvrtko Ursulin
  2016-04-05  8:59     ` Chris Wilson
  1 sibling, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05  8:54 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 04/04/16 19:58, Chris Wilson wrote:
> On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Current implementation releases the forcewake at any time between
>> straight away, and one jiffie from the last put, or first automatic
>> grab.
>
> That isn't the problem though. The problem is that we set the timer on
> first use rather than last use. All you are stating here is that by
> lengthening the timeout on your system you reduce the number of times it
> times out.

It is true the reduction I see is due lengthening of the average timeout.

But with regards to re-arming approach, I thought so initially myself, 
but then, if we expect bursty access then it shouldn't matter and the 
simplicity of doing it like it currently is better.

Even in practice, I noticed the effect of lengthening the timeout is 
much greater than moving the arming to the last access. And to get to 
very few to none auto releases on busy workloads we need in the regions 
of 5ms, which would be a big change. Or maybe not if you consider HZ=100 
kernels.

It is very difficult to know what is actually better considering power 
between the CPU and GPU and performance. So I though the best would be 
to keep it similar to the current timings, just fix the dependency on HZ 
and also slack with hrtimers might help with something.

As a final data point, explicit puts and auto releases seems to be 
relatively balanced in my testing. With this patch T-Rex for example 
auto-releases in the region of 3-4 times in 10ms, with around 5-10 
explicit gets, and 5-10 implicit gets in 10ms.

A different, interrupt heavy workload (~20k irqs/sec) manages similarly 
2-4 auto-releases per 10ms, and has ~250 explicit gets and ~380 implicit 
per 10ms.

Looking at the two I think the fact they manage to auto-release 
relatively similarly, compared to a huge different in fw gets, suggest 
burstyness. So I am not sure that any smarts with the release period 
would be interesting. At least not without serious power/perf measurements.

> Having said that, the conversion to hrtimer seems sensible but to add
> tracking of the last access as opposed to first we either fallback to
> jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
> fast enough for every register write. Hmm, my usual response to that has
> been if it matters we avoid the heavyweight macros and use the _FW
> interface - or even raw readl/writel.
>
> Could you try storing ktime_get_raw() on every access and rearming the
> timer if it expires before last-access + min-period?

That would be similar to your patch from before my holiday, right? As I 
said above, I did not notice much change with that approach. Just 
extending the timeout has a much greater effect, but as I wrote above, I 
am not sure we want to change it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-05  8:54   ` Tvrtko Ursulin
@ 2016-04-05  8:59     ` Chris Wilson
  2016-04-05 10:02       ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-04-05  8:59 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote:
> 
> On 04/04/16 19:58, Chris Wilson wrote:
> >On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Current implementation releases the forcewake at any time between
> >>straight away, and one jiffie from the last put, or first automatic
> >>grab.
> >
> >That isn't the problem though. The problem is that we set the timer on
> >first use rather than last use. All you are stating here is that by
> >lengthening the timeout on your system you reduce the number of times it
> >times out.
> 
> It is true the reduction I see is due lengthening of the average timeout.
> 
> But with regards to re-arming approach, I thought so initially
> myself, but then, if we expect bursty access then it shouldn't
> matter and the simplicity of doing it like it currently is better.
> 
> Even in practice, I noticed the effect of lengthening the timeout is
> much greater than moving the arming to the last access. And to get
> to very few to none auto releases on busy workloads we need in the
> regions of 5ms, which would be a big change. Or maybe not if you
> consider HZ=100 kernels.
> 
> It is very difficult to know what is actually better considering
> power between the CPU and GPU and performance. So I though the best
> would be to keep it similar to the current timings, just fix the
> dependency on HZ and also slack with hrtimers might help with
> something.
> 
> As a final data point, explicit puts and auto releases seems to be
> relatively balanced in my testing. With this patch T-Rex for example
> auto-releases in the region of 3-4 times in 10ms, with around 5-10
> explicit gets, and 5-10 implicit gets in 10ms.
> 
> A different, interrupt heavy workload (~20k irqs/sec) manages
> similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and
> ~380 implicit per 10ms.
> 
> Looking at the two I think the fact they manage to auto-release
> relatively similarly, compared to a huge different in fw gets,
> suggest burstyness. So I am not sure that any smarts with the
> release period would be interesting. At least not without serious
> power/perf measurements.
> 
> >Having said that, the conversion to hrtimer seems sensible but to add
> >tracking of the last access as opposed to first we either fallback to
> >jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
> >fast enough for every register write. Hmm, my usual response to that has
> >been if it matters we avoid the heavyweight macros and use the _FW
> >interface - or even raw readl/writel.
> >
> >Could you try storing ktime_get_raw() on every access and rearming the
> >timer if it expires before last-access + min-period?
> 
> That would be similar to your patch from before my holiday, right?
> As I said above, I did not notice much change with that approach.
> Just extending the timeout has a much greater effect, but as I wrote
> above, I am not sure we want to change it.

There was just one bug in that patch in checking the expiration that
makes a huge difference, but if you please add the discussion above to
the commit log that would be invaluable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains
  2016-04-04 19:07   ` Chris Wilson
@ 2016-04-05  9:02     ` Tvrtko Ursulin
  2016-04-05  9:47       ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05  9:02 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 04/04/16 20:07, Chris Wilson wrote:
> On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> On platforms with multiple forcewake domains it seems more efficient
>> to request all desired ones and then to wait for acks to avoid
>> needlessly serializing on each domain.
>
> Not convinced since we have more machines with one domain than two. What
> I did was to compact the domains array so that we only iterated over the
> known set - but that feels overkill when we only have two domains today.
>
> For the same reason (only one machine with two domains), I didn't think
> seperate functions to iterate over one domain and another to iterate
> over all was worth it.
>
> What you can do though is remove an excess posting read from
> fw_domains_put.
>
> Compared to the cost of a register access (the spinlock irq mostly) the
> iterator doesn't strike me as being that worthwhile an optimisation
> target.

Correct, I thought we agreed that the majority of the CPU time 
attributed to fw_domains_get is from the busy spinning while waiting on 
the ack from the GPU.

This patch is not optimising the iterator, but requests all domains to 
be woken up and then waits for acks. It changes the time spent busy 
spinning from Td1 + ... + Td2 to max(Td1...Tdn).

Yes it is only interesting for platforms with more than one fw domain. 
But since we agreed iterator is not significant, the fact that it adds 
two loops* over the array should not be noticeable vs. the gain for 
multi-fw domain machines (which will be more and more of as time goes by).

Regards,

Tvrtko

* Also because 2/3 from this serious has shrunk the iterator 
considerably, even with two iterations fw_domains_get remains pretty 
much the same size now with two loops, vs one loop before it.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators
  2016-04-04 19:14   ` Dave Gordon
@ 2016-04-05  9:05     ` Tvrtko Ursulin
  2016-04-05  9:36       ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05  9:05 UTC (permalink / raw)
  To: Dave Gordon, Intel-gfx


On 04/04/16 20:14, Dave Gordon wrote:
> On 04/04/16 17:51, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> As the vast majority of users does not use the domain id variable,
>
> "do not use"

Yep.

>
>> we can eliminate it from the iterator and also change the latter
>> using the same principle as was recently done for for_each_engine.
>>
>> For a couple of callers which do need the domain id, or the domain
>> mask, store the later in the domain itself at init time and use
>> both through it.
>
> "For a couple of callers which do need the domain mask, store it
> in the domain array (which already has the domain id), then both
> can be retrieved thence." ?

Better yes.

>> Result is clearer code and smaller generated binary, especially
>> in the tight fw get/put loops. Also, relationship between domain
>> id and mask is no longer assumed in the macro.
>
> "in the macro or elsewhere" ?

Just in the macro, it is still hardcoded in fw_domain_init.

>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> A few typos & clarifications above, plus a minor quibble about a macro
> name (which you haven't actually changed, but might as well fix).
> Otherwise LGTM, so
>
> Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
>
> (even if you don't change the macro name)

Thanks, consistency sounds good so I will change it.

>> ---
>>   drivers/gpu/drm/i915/i915_debugfs.c |  5 ++---
>>   drivers/gpu/drm/i915/i915_drv.h     | 17 +++++++-------
>>   drivers/gpu/drm/i915/intel_uncore.c | 45
>> +++++++++++++++++--------------------
>>   3 files changed, 32 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
>> b/drivers/gpu/drm/i915/i915_debugfs.c
>> index a2e3af02c292..06fce014d0b4 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct
>> seq_file *m, void *data)
>>       struct drm_device *dev = node->minor->dev;
>>       struct drm_i915_private *dev_priv = dev->dev_private;
>>       struct intel_uncore_forcewake_domain *fw_domain;
>> -    int i;
>>
>>       spin_lock_irq(&dev_priv->uncore.lock);
>> -    for_each_fw_domain(fw_domain, dev_priv, i) {
>> +    for_each_fw_domain(fw_domain, dev_priv) {
>>           seq_printf(m, "%s.wake_count = %u\n",
>> -               intel_uncore_forcewake_domain_to_str(i),
>> +               intel_uncore_forcewake_domain_to_str(fw_domain->id),
>>                  fw_domain->wake_count);
>>       }
>>       spin_unlock_irq(&dev_priv->uncore.lock);
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 7d4c704d7d75..160f980f0368 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -665,6 +665,7 @@ struct intel_uncore {
>>       struct intel_uncore_forcewake_domain {
>>           struct drm_i915_private *i915;
>>           enum forcewake_domain_id id;
>> +        enum forcewake_domains mask;
>
> At present this mask will always have only one bit set, but I suppose
> there might be some utility in allowing multiple bits (virtual domains?)

I did not like the name mask myself but couldn't think of anything 
better. Do you have a suggestion?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators
  2016-04-05  9:05     ` Tvrtko Ursulin
@ 2016-04-05  9:36       ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-05  9:36 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Apr 05, 2016 at 10:05:24AM +0100, Tvrtko Ursulin wrote:
> On 04/04/16 20:14, Dave Gordon wrote:
> >On 04/04/16 17:51, Tvrtko Ursulin wrote:
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>b/drivers/gpu/drm/i915/i915_drv.h
> >>index 7d4c704d7d75..160f980f0368 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -665,6 +665,7 @@ struct intel_uncore {
> >>      struct intel_uncore_forcewake_domain {
> >>          struct drm_i915_private *i915;
> >>          enum forcewake_domain_id id;
> >>+        enum forcewake_domains mask;
> >
> >At present this mask will always have only one bit set, but I suppose
> >there might be some utility in allowing multiple bits (virtual domains?)
> 
> I did not like the name mask myself but couldn't think of anything
> better. Do you have a suggestion?

mask is fine as we use it as a mask. We've called it intel_engine_flag()
which is worse imo. If there is a much better name for this, we should
also fixup that name as well.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains
  2016-04-05  9:02     ` Tvrtko Ursulin
@ 2016-04-05  9:47       ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-05  9:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Apr 05, 2016 at 10:02:28AM +0100, Tvrtko Ursulin wrote:
> 
> On 04/04/16 20:07, Chris Wilson wrote:
> >On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>On platforms with multiple forcewake domains it seems more efficient
> >>to request all desired ones and then to wait for acks to avoid
> >>needlessly serializing on each domain.
> >
> >Not convinced since we have more machines with one domain than two. What
> >I did was to compact the domains array so that we only iterated over the
> >known set - but that feels overkill when we only have two domains today.
> >
> >For the same reason (only one machine with two domains), I didn't think
> >seperate functions to iterate over one domain and another to iterate
> >over all was worth it.
> >
> >What you can do though is remove an excess posting read from
> >fw_domains_put.
> >
> >Compared to the cost of a register access (the spinlock irq mostly) the
> >iterator doesn't strike me as being that worthwhile an optimisation
> >target.
> 
> Correct, I thought we agreed that the majority of the CPU time
> attributed to fw_domains_get is from the busy spinning while waiting
> on the ack from the GPU.
> 
> This patch is not optimising the iterator, but requests all domains
> to be woken up and then waits for acks. It changes the time spent
> busy spinning from Td1 + ... + Td2 to max(Td1...Tdn).
> 
> Yes it is only interesting for platforms with more than one fw
> domain. But since we agreed iterator is not significant, the fact
> that it adds two loops* over the array should not be noticeable vs.
> the gain for multi-fw domain machines (which will be more and more
> of as time goes by).

Then we should be first looking at the cases where we are requesting
multiple domains to be woken where we only need one.

Anyway you've changed my mind,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-05  8:59     ` Chris Wilson
@ 2016-04-05 10:02       ` Tvrtko Ursulin
  2016-04-05 10:44         ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05 10:02 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 05/04/16 09:59, Chris Wilson wrote:
> On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote:
>>
>> On 04/04/16 19:58, Chris Wilson wrote:
>>> On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Current implementation releases the forcewake at any time between
>>>> straight away, and one jiffie from the last put, or first automatic
>>>> grab.
>>>
>>> That isn't the problem though. The problem is that we set the timer on
>>> first use rather than last use. All you are stating here is that by
>>> lengthening the timeout on your system you reduce the number of times it
>>> times out.
>>
>> It is true the reduction I see is due lengthening of the average timeout.
>>
>> But with regards to re-arming approach, I thought so initially
>> myself, but then, if we expect bursty access then it shouldn't
>> matter and the simplicity of doing it like it currently is better.
>>
>> Even in practice, I noticed the effect of lengthening the timeout is
>> much greater than moving the arming to the last access. And to get
>> to very few to none auto releases on busy workloads we need in the
>> regions of 5ms, which would be a big change. Or maybe not if you
>> consider HZ=100 kernels.
>>
>> It is very difficult to know what is actually better considering
>> power between the CPU and GPU and performance. So I though the best
>> would be to keep it similar to the current timings, just fix the
>> dependency on HZ and also slack with hrtimers might help with
>> something.
>>
>> As a final data point, explicit puts and auto releases seems to be
>> relatively balanced in my testing. With this patch T-Rex for example
>> auto-releases in the region of 3-4 times in 10ms, with around 5-10
>> explicit gets, and 5-10 implicit gets in 10ms.
>>
>> A different, interrupt heavy workload (~20k irqs/sec) manages
>> similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and
>> ~380 implicit per 10ms.
>>
>> Looking at the two I think the fact they manage to auto-release
>> relatively similarly, compared to a huge different in fw gets,
>> suggest burstyness. So I am not sure that any smarts with the
>> release period would be interesting. At least not without serious
>> power/perf measurements.
>>
>>> Having said that, the conversion to hrtimer seems sensible but to add
>>> tracking of the last access as opposed to first we either fallback to
>>> jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
>>> fast enough for every register write. Hmm, my usual response to that has
>>> been if it matters we avoid the heavyweight macros and use the _FW
>>> interface - or even raw readl/writel.
>>>
>>> Could you try storing ktime_get_raw() on every access and rearming the
>>> timer if it expires before last-access + min-period?
>>
>> That would be similar to your patch from before my holiday, right?
>> As I said above, I did not notice much change with that approach.
>> Just extending the timeout has a much greater effect, but as I wrote
>> above, I am not sure we want to change it.
>
> There was just one bug in that patch in checking the expiration that
> makes a huge difference, but if you please add the discussion above to
> the commit log that would be invaluable.

I think I had a fixed version, would have to dig in branches now.

I will add more text to the commit.

Most importantly, do you think this would affect HZ=10 or HZ=250 kernels 
at all? Presumably if it could, someone would know today that there is a 
difference between the kernels based on HZ already?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-05 10:02       ` Tvrtko Ursulin
@ 2016-04-05 10:44         ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-05 10:44 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Apr 05, 2016 at 11:02:15AM +0100, Tvrtko Ursulin wrote:
> 
> On 05/04/16 09:59, Chris Wilson wrote:
> >On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 04/04/16 19:58, Chris Wilson wrote:
> >>>On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>>Current implementation releases the forcewake at any time between
> >>>>straight away, and one jiffie from the last put, or first automatic
> >>>>grab.
> >>>
> >>>That isn't the problem though. The problem is that we set the timer on
> >>>first use rather than last use. All you are stating here is that by
> >>>lengthening the timeout on your system you reduce the number of times it
> >>>times out.
> >>
> >>It is true the reduction I see is due lengthening of the average timeout.
> >>
> >>But with regards to re-arming approach, I thought so initially
> >>myself, but then, if we expect bursty access then it shouldn't
> >>matter and the simplicity of doing it like it currently is better.
> >>
> >>Even in practice, I noticed the effect of lengthening the timeout is
> >>much greater than moving the arming to the last access. And to get
> >>to very few to none auto releases on busy workloads we need in the
> >>regions of 5ms, which would be a big change. Or maybe not if you
> >>consider HZ=100 kernels.
> >>
> >>It is very difficult to know what is actually better considering
> >>power between the CPU and GPU and performance. So I though the best
> >>would be to keep it similar to the current timings, just fix the
> >>dependency on HZ and also slack with hrtimers might help with
> >>something.
> >>
> >>As a final data point, explicit puts and auto releases seems to be
> >>relatively balanced in my testing. With this patch T-Rex for example
> >>auto-releases in the region of 3-4 times in 10ms, with around 5-10
> >>explicit gets, and 5-10 implicit gets in 10ms.
> >>
> >>A different, interrupt heavy workload (~20k irqs/sec) manages
> >>similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and
> >>~380 implicit per 10ms.
> >>
> >>Looking at the two I think the fact they manage to auto-release
> >>relatively similarly, compared to a huge different in fw gets,
> >>suggest burstyness. So I am not sure that any smarts with the
> >>release period would be interesting. At least not without serious
> >>power/perf measurements.
> >>
> >>>Having said that, the conversion to hrtimer seems sensible but to add
> >>>tracking of the last access as opposed to first we either fallback to
> >>>jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
> >>>fast enough for every register write. Hmm, my usual response to that has
> >>>been if it matters we avoid the heavyweight macros and use the _FW
> >>>interface - or even raw readl/writel.
> >>>
> >>>Could you try storing ktime_get_raw() on every access and rearming the
> >>>timer if it expires before last-access + min-period?
> >>
> >>That would be similar to your patch from before my holiday, right?
> >>As I said above, I did not notice much change with that approach.
> >>Just extending the timeout has a much greater effect, but as I wrote
> >>above, I am not sure we want to change it.
> >
> >There was just one bug in that patch in checking the expiration that
> >makes a huge difference, but if you please add the discussion above to
> >the commit log that would be invaluable.
> 
> I think I had a fixed version, would have to dig in branches now.
> 
> I will add more text to the commit.
> 
> Most importantly, do you think this would affect HZ=10 or HZ=250
> kernels at all? Presumably if it could, someone would know today
> that there is a difference between the kernels based on HZ already?

Hmm, I switch quite regularly - that may explain why sometimes I see
fw_domains_get quite heavy in the profiles and other. But I have never
correlated that with changing .configs. And I don't keep stats for rc6
during stress tests - especially more notable is the lack of light
desktop load testing.

One interesting side-effect of htrimer is that iirc it is installed on
the local cpu and isn't run until we hit a preemption point (i.e. the
scheduler in my case) so for those who are running high latency low HZ
are likely unaffected at all!

As far as hrtimer is concerned, my only real worry is that if it is too
expensive - but since it is being increasingly widely used I shouldn't
worry myself too much about that (if it were to be a problem, it will
get fixed!).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2016-04-05  6:26 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] " Patchwork
@ 2016-04-05 11:01 ` Tvrtko Ursulin
  2016-04-05 11:01   ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
                     ` (2 more replies)
  5 siblings, 3 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05 11:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Because it is based on jiffies, current implementation releases the
forcewake at any time between straight away and between 1ms and 10ms,
depending on the kernel configuration (CONFIG_HZ).

This is probably not what has been desired, since the dynamics of keeping
parts of the GPU awake should not be correlated with this kernel
configuration parameter.

Change the auto-release mechanism to use hrtimers and set the timeout to
1ms with a 1ms of slack. This should make the GPU power consistent
across kernel configs, and timer slack should enable some timer coalescing
where multiple force-wake domains exist, or with unrelated timers.

For GlBench/T-Rex this decreases the number of forcewake releases from
~480 to ~300 per second, and for a heavy combined OGL/OCL test from
~670 to ~360 (HZ=1000 kernel).

Even though this reduction can be attributed to the average release period
extending from 0-1ms to 1-2ms, as discussed above, it will make the
forcewake timeout consistent for different CONFIG_HZ values.

Real life measurements with the above workload has shown that, with this
patch, both manage to auto-release the forcewake between 2-4 times per
10ms, even though the number of forcewake gets is dramatically different.

T-Rex requests between 5-10 explicit gets and 5-10 implict gets in each
10ms period, while the OGL/OCL test requests 250 and 380 times in the same
period.

The two data points together suggest that the nature of the forwake
accesses is bursty and that further changes and potential timeout
extensions, or moving the start of timeout from the first to the last
automatic forcewake grab, should be carefully measured for power and
performance effects.

v2:
  * Commit spelling. (Dave Gordon)
  * More discussion on numbers in the commit. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     |  2 +-
 drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++---------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd187727c813..7d4c704d7d75 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -666,7 +666,7 @@ struct intel_uncore {
 		struct drm_i915_private *i915;
 		enum forcewake_domain_id id;
 		unsigned wake_count;
-		struct timer_list timer;
+		struct hrtimer timer;
 		i915_reg_t reg_set;
 		u32 val_set;
 		u32 val_clear;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index ac1c545436af..76ac990de354 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -60,7 +60,11 @@ fw_domain_reset(const struct intel_uncore_forcewake_domain *d)
 static inline void
 fw_domain_arm_timer(struct intel_uncore_forcewake_domain *d)
 {
-	mod_timer_pinned(&d->timer, jiffies + 1);
+	d->wake_count++;
+	hrtimer_start_range_ns(&d->timer,
+			       ktime_set(0, NSEC_PER_MSEC),
+			       NSEC_PER_MSEC,
+			       HRTIMER_MODE_REL);
 }
 
 static inline void
@@ -224,9 +228,11 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
-static void intel_uncore_fw_release_timer(unsigned long arg)
+static enum hrtimer_restart
+intel_uncore_fw_release_timer(struct hrtimer *timer)
 {
-	struct intel_uncore_forcewake_domain *domain = (void *)arg;
+	struct intel_uncore_forcewake_domain *domain =
+	       container_of(timer, struct intel_uncore_forcewake_domain, timer);
 	unsigned long irqflags;
 
 	assert_rpm_device_not_suspended(domain->i915);
@@ -240,6 +246,8 @@ static void intel_uncore_fw_release_timer(unsigned long arg)
 							  1 << domain->id);
 
 	spin_unlock_irqrestore(&domain->i915->uncore.lock, irqflags);
+
+	return HRTIMER_NORESTART;
 }
 
 void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
@@ -259,16 +267,16 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 		active_domains = 0;
 
 		for_each_fw_domain(domain, dev_priv, id) {
-			if (del_timer_sync(&domain->timer) == 0)
+			if (hrtimer_cancel(&domain->timer) == 0)
 				continue;
 
-			intel_uncore_fw_release_timer((unsigned long)domain);
+			intel_uncore_fw_release_timer(&domain->timer);
 		}
 
 		spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
 		for_each_fw_domain(domain, dev_priv, id) {
-			if (timer_pending(&domain->timer))
+			if (hrtimer_active(&domain->timer))
 				active_domains |= (1 << id);
 		}
 
@@ -491,7 +499,6 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 		if (--domain->wake_count)
 			continue;
 
-		domain->wake_count++;
 		fw_domain_arm_timer(domain);
 	}
 }
@@ -732,7 +739,6 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
 			continue;
 		}
 
-		domain->wake_count++;
 		fw_domain_arm_timer(domain);
 	}
 
@@ -1150,7 +1156,8 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
 	d->i915 = dev_priv;
 	d->id = domain_id;
 
-	setup_timer(&d->timer, intel_uncore_fw_release_timer, (unsigned long)d);
+	hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	d->timer.function = intel_uncore_fw_release_timer;
 
 	dev_priv->uncore.fw_domains |= (1 << domain_id);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators
  2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin
@ 2016-04-05 11:01   ` Tvrtko Ursulin
  2016-04-05 11:01   ` [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
  2016-04-05 11:22   ` [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
  2 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05 11:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As the vast majority of users do not use the domain id variable,
we can eliminate it from the iterator and also change the latter
using the same principle as was recently done for for_each_engine.

For a couple of callers which do need the domain mask, store it
in the domain array (which already has the domain id), then both
can be retrieved thence.

Result is clearer code and smaller generated binary, especially
in the tight fw get/put loops. Also, relationship between domain
id and mask is no longer assumed in the macro.

v2: Improve grammar in the commit message and rename the
    iterator to for_each_fw_domain_masked for consistency.
    (Dave Gordon)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  5 ++---
 drivers/gpu/drm/i915/i915_drv.h     | 17 +++++++-------
 drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++++++++--------------------
 3 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a2e3af02c292..06fce014d0b4 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_uncore_forcewake_domain *fw_domain;
-	int i;
 
 	spin_lock_irq(&dev_priv->uncore.lock);
-	for_each_fw_domain(fw_domain, dev_priv, i) {
+	for_each_fw_domain(fw_domain, dev_priv) {
 		seq_printf(m, "%s.wake_count = %u\n",
-			   intel_uncore_forcewake_domain_to_str(i),
+			   intel_uncore_forcewake_domain_to_str(fw_domain->id),
 			   fw_domain->wake_count);
 	}
 	spin_unlock_irq(&dev_priv->uncore.lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d4c704d7d75..fff025071a0f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -665,6 +665,7 @@ struct intel_uncore {
 	struct intel_uncore_forcewake_domain {
 		struct drm_i915_private *i915;
 		enum forcewake_domain_id id;
+		enum forcewake_domains mask;
 		unsigned wake_count;
 		struct hrtimer timer;
 		i915_reg_t reg_set;
@@ -679,14 +680,14 @@ struct intel_uncore {
 };
 
 /* Iterate over initialised fw domains */
-#define for_each_fw_domain_mask(domain__, mask__, dev_priv__, i__) \
-	for ((i__) = 0, (domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
-	     (i__) < FW_DOMAIN_ID_COUNT; \
-	     (i__)++, (domain__) = &(dev_priv__)->uncore.fw_domain[i__]) \
-		for_each_if (((mask__) & (dev_priv__)->uncore.fw_domains) & (1 << (i__)))
-
-#define for_each_fw_domain(domain__, dev_priv__, i__) \
-	for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__, i__)
+#define for_each_fw_domain_masked(domain__, mask__, dev_priv__) \
+	for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
+	     (domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \
+	     (domain__)++) \
+		for_each_if ((mask__) & (domain__)->mask)
+
+#define for_each_fw_domain(domain__, dev_priv__) \
+	for_each_fw_domain_masked(domain__, FORCEWAKE_ALL, dev_priv__)
 
 #define CSR_VERSION(major, minor)	((major) << 16 | (minor))
 #define CSR_VERSION_MAJOR(version)	((version) >> 16)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 76ac990de354..5b9c307c5222 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -111,9 +111,8 @@ static void
 fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
-	for_each_fw_domain_mask(d, fw_domains, dev_priv, id) {
+	for_each_fw_domain_masked(d, fw_domains, dev_priv) {
 		fw_domain_wait_ack_clear(d);
 		fw_domain_get(d);
 		fw_domain_wait_ack(d);
@@ -124,9 +123,8 @@ static void
 fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
-	for_each_fw_domain_mask(d, fw_domains, dev_priv, id) {
+	for_each_fw_domain_masked(d, fw_domains, dev_priv) {
 		fw_domain_put(d);
 		fw_domain_posting_read(d);
 	}
@@ -136,10 +134,9 @@ static void
 fw_domains_posting_read(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
 	/* No need to do for all, just do for first found */
-	for_each_fw_domain(d, dev_priv, id) {
+	for_each_fw_domain(d, dev_priv) {
 		fw_domain_posting_read(d);
 		break;
 	}
@@ -149,12 +146,11 @@ static void
 fw_domains_reset(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *d;
-	enum forcewake_domain_id id;
 
 	if (dev_priv->uncore.fw_domains == 0)
 		return;
 
-	for_each_fw_domain_mask(d, fw_domains, dev_priv, id)
+	for_each_fw_domain_masked(d, fw_domains, dev_priv)
 		fw_domain_reset(d);
 
 	fw_domains_posting_read(dev_priv);
@@ -256,7 +252,6 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 	unsigned long irqflags;
 	struct intel_uncore_forcewake_domain *domain;
 	int retry_count = 100;
-	enum forcewake_domain_id id;
 	enum forcewake_domains fw = 0, active_domains;
 
 	/* Hold uncore.lock across reset to prevent any register access
@@ -266,7 +261,7 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 	while (1) {
 		active_domains = 0;
 
-		for_each_fw_domain(domain, dev_priv, id) {
+		for_each_fw_domain(domain, dev_priv) {
 			if (hrtimer_cancel(&domain->timer) == 0)
 				continue;
 
@@ -275,9 +270,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 
 		spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
-		for_each_fw_domain(domain, dev_priv, id) {
+		for_each_fw_domain(domain, dev_priv) {
 			if (hrtimer_active(&domain->timer))
-				active_domains |= (1 << id);
+				active_domains |= domain->mask;
 		}
 
 		if (active_domains == 0)
@@ -294,9 +289,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 
 	WARN_ON(active_domains);
 
-	for_each_fw_domain(domain, dev_priv, id)
+	for_each_fw_domain(domain, dev_priv)
 		if (domain->wake_count)
-			fw |= 1 << id;
+			fw |= domain->mask;
 
 	if (fw)
 		dev_priv->uncore.funcs.force_wake_put(dev_priv, fw);
@@ -418,16 +413,15 @@ static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
 					 enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
 	fw_domains &= dev_priv->uncore.fw_domains;
 
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+	for_each_fw_domain_masked(domain, fw_domains, dev_priv) {
 		if (domain->wake_count++)
-			fw_domains &= ~(1 << id);
+			fw_domains &= ~domain->mask;
 	}
 
 	if (fw_domains)
@@ -485,14 +479,13 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 					 enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_put)
 		return;
 
 	fw_domains &= dev_priv->uncore.fw_domains;
 
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+	for_each_fw_domain_masked(domain, fw_domains, dev_priv) {
 		if (WARN_ON(domain->wake_count == 0))
 			continue;
 
@@ -546,12 +539,11 @@ void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv,
 void assert_forcewakes_inactive(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
-	for_each_fw_domain(domain, dev_priv, id)
+	for_each_fw_domain(domain, dev_priv)
 		WARN_ON(domain->wake_count);
 }
 
@@ -727,15 +719,14 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
 				     enum forcewake_domains fw_domains)
 {
 	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (WARN_ON(!fw_domains))
 		return;
 
 	/* Ideally GCC would be constant-fold and eliminate this loop */
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+	for_each_fw_domain_masked(domain, fw_domains, dev_priv) {
 		if (domain->wake_count) {
-			fw_domains &= ~(1 << id);
+			fw_domains &= ~domain->mask;
 			continue;
 		}
 
@@ -1156,6 +1147,12 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
 	d->i915 = dev_priv;
 	d->id = domain_id;
 
+	BUILD_BUG_ON(FORCEWAKE_RENDER != (1 << FW_DOMAIN_ID_RENDER));
+	BUILD_BUG_ON(FORCEWAKE_BLITTER != (1 << FW_DOMAIN_ID_BLITTER));
+	BUILD_BUG_ON(FORCEWAKE_MEDIA != (1 << FW_DOMAIN_ID_MEDIA));
+
+	d->mask = 1 << domain_id;
+
 	hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	d->timer.function = intel_uncore_fw_release_timer;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains
  2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin
  2016-04-05 11:01   ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
@ 2016-04-05 11:01   ` Tvrtko Ursulin
  2016-04-05 11:22   ` [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
  2 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05 11:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

On platforms with multiple forcewake domains it seems more efficient
to request all desired ones and then to wait for acks to avoid
needlessly serializing on each domain.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_uncore.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 5b9c307c5222..899cb71914c7 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -115,8 +115,10 @@ fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
 	for_each_fw_domain_masked(d, fw_domains, dev_priv) {
 		fw_domain_wait_ack_clear(d);
 		fw_domain_get(d);
-		fw_domain_wait_ack(d);
 	}
+
+	for_each_fw_domain_masked(d, fw_domains, dev_priv)
+		fw_domain_wait_ack(d);
 }
 
 static void
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
  2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin
  2016-04-05 11:01   ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
  2016-04-05 11:01   ` [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
@ 2016-04-05 11:22   ` Chris Wilson
  2 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-04-05 11:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Apr 05, 2016 at 12:01:10PM +0100, Tvrtko Ursulin wrote:
>  		for_each_fw_domain(domain, dev_priv, id) {
> -			if (del_timer_sync(&domain->timer) == 0)
> +			if (hrtimer_cancel(&domain->timer) == 0)
>  				continue;

Had to double check that this was consistent, 0 => not running, => 1
pending and so we need to manually release the timer.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-04-05 11:22 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin
2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
2016-04-04 19:00   ` Chris Wilson
2016-04-04 19:14   ` Dave Gordon
2016-04-05  9:05     ` Tvrtko Ursulin
2016-04-05  9:36       ` Chris Wilson
2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
2016-04-04 19:07   ` Chris Wilson
2016-04-05  9:02     ` Tvrtko Ursulin
2016-04-05  9:47       ` Chris Wilson
2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
2016-04-04 19:41   ` Dave Gordon
2016-04-04 20:22     ` Chris Wilson
2016-04-05  8:54   ` Tvrtko Ursulin
2016-04-05  8:59     ` Chris Wilson
2016-04-05 10:02       ` Tvrtko Ursulin
2016-04-05 10:44         ` Chris Wilson
2016-04-04 18:58 ` Dave Gordon
2016-04-05  6:26 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] " Patchwork
2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin
2016-04-05 11:01   ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
2016-04-05 11:01   ` [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin
2016-04-05 11:22   ` [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox