* [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs
@ 2016-04-04 16:51 Tvrtko Ursulin
2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin
` (5 more replies)
0 siblings, 6 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2016-04-04 16:51 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Current implementation releases the forcewake at any time between
straight away, and one jiffie from the last put, or first automatic
grab.
This does not sound like what was desired since jiffies are typically
between 1 and 10ms depending on the kernel configuration.
Change the auto-release mechanism to use hrtimers and set the timeout
to 1ms with a 1ms of slack. This should make the GPU power consistent
across kernel configs and the slack should enable some timer coallescing
where multiple force-wake domains exist, or with unrelated timers.
For GlBench/T-Rex this decreases the number of forcewake releases from
~480 to ~300 per second, and for a heavy combined OGL/OCL test from
~670 to ~360.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++---------
2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd187727c813..7d4c704d7d75 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -666,7 +666,7 @@ struct intel_uncore {
struct drm_i915_private *i915;
enum forcewake_domain_id id;
unsigned wake_count;
- struct timer_list timer;
+ struct hrtimer timer;
i915_reg_t reg_set;
u32 val_set;
u32 val_clear;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index ac1c545436af..76ac990de354 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -60,7 +60,11 @@ fw_domain_reset(const struct intel_uncore_forcewake_domain *d)
static inline void
fw_domain_arm_timer(struct intel_uncore_forcewake_domain *d)
{
- mod_timer_pinned(&d->timer, jiffies + 1);
+ d->wake_count++;
+ hrtimer_start_range_ns(&d->timer,
+ ktime_set(0, NSEC_PER_MSEC),
+ NSEC_PER_MSEC,
+ HRTIMER_MODE_REL);
}
static inline void
@@ -224,9 +228,11 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
return ret;
}
-static void intel_uncore_fw_release_timer(unsigned long arg)
+static enum hrtimer_restart
+intel_uncore_fw_release_timer(struct hrtimer *timer)
{
- struct intel_uncore_forcewake_domain *domain = (void *)arg;
+ struct intel_uncore_forcewake_domain *domain =
+ container_of(timer, struct intel_uncore_forcewake_domain, timer);
unsigned long irqflags;
assert_rpm_device_not_suspended(domain->i915);
@@ -240,6 +246,8 @@ static void intel_uncore_fw_release_timer(unsigned long arg)
1 << domain->id);
spin_unlock_irqrestore(&domain->i915->uncore.lock, irqflags);
+
+ return HRTIMER_NORESTART;
}
void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
@@ -259,16 +267,16 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
active_domains = 0;
for_each_fw_domain(domain, dev_priv, id) {
- if (del_timer_sync(&domain->timer) == 0)
+ if (hrtimer_cancel(&domain->timer) == 0)
continue;
- intel_uncore_fw_release_timer((unsigned long)domain);
+ intel_uncore_fw_release_timer(&domain->timer);
}
spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
for_each_fw_domain(domain, dev_priv, id) {
- if (timer_pending(&domain->timer))
+ if (hrtimer_active(&domain->timer))
active_domains |= (1 << id);
}
@@ -491,7 +499,6 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
if (--domain->wake_count)
continue;
- domain->wake_count++;
fw_domain_arm_timer(domain);
}
}
@@ -732,7 +739,6 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv,
continue;
}
- domain->wake_count++;
fw_domain_arm_timer(domain);
}
@@ -1150,7 +1156,8 @@ static void fw_domain_init(struct drm_i915_private *dev_priv,
d->i915 = dev_priv;
d->id = domain_id;
- setup_timer(&d->timer, intel_uncore_fw_release_timer, (unsigned long)d);
+ hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ d->timer.function = intel_uncore_fw_release_timer;
dev_priv->uncore.fw_domains |= (1 << domain_id);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin @ 2016-04-04 16:51 ` Tvrtko Ursulin 2016-04-04 19:00 ` Chris Wilson 2016-04-04 19:14 ` Dave Gordon 2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin ` (4 subsequent siblings) 5 siblings, 2 replies; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-04 16:51 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> As the vast majority of users does not use the domain id variable, we can eliminate it from the iterator and also change the latter using the same principle as was recently done for for_each_engine. For a couple of callers which do need the domain id, or the domain mask, store the later in the domain itself at init time and use both through it. Result is clearer code and smaller generated binary, especially in the tight fw get/put loops. Also, relationship between domain id and mask is no longer assumed in the macro. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_debugfs.c | 5 ++--- drivers/gpu/drm/i915/i915_drv.h | 17 +++++++------- drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++++++++-------------------- 3 files changed, 32 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index a2e3af02c292..06fce014d0b4 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct seq_file *m, void *data) struct drm_device *dev = node->minor->dev; struct drm_i915_private *dev_priv = dev->dev_private; struct intel_uncore_forcewake_domain *fw_domain; - int i; spin_lock_irq(&dev_priv->uncore.lock); - for_each_fw_domain(fw_domain, dev_priv, i) { + for_each_fw_domain(fw_domain, dev_priv) { seq_printf(m, "%s.wake_count = %u\n", - intel_uncore_forcewake_domain_to_str(i), + intel_uncore_forcewake_domain_to_str(fw_domain->id), fw_domain->wake_count); } spin_unlock_irq(&dev_priv->uncore.lock); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7d4c704d7d75..160f980f0368 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -665,6 +665,7 @@ struct intel_uncore { struct intel_uncore_forcewake_domain { struct drm_i915_private *i915; enum forcewake_domain_id id; + enum forcewake_domains mask; unsigned wake_count; struct hrtimer timer; i915_reg_t reg_set; @@ -679,14 +680,14 @@ struct intel_uncore { }; /* Iterate over initialised fw domains */ -#define for_each_fw_domain_mask(domain__, mask__, dev_priv__, i__) \ - for ((i__) = 0, (domain__) = &(dev_priv__)->uncore.fw_domain[0]; \ - (i__) < FW_DOMAIN_ID_COUNT; \ - (i__)++, (domain__) = &(dev_priv__)->uncore.fw_domain[i__]) \ - for_each_if (((mask__) & (dev_priv__)->uncore.fw_domains) & (1 << (i__))) - -#define for_each_fw_domain(domain__, dev_priv__, i__) \ - for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__, i__) +#define for_each_fw_domain_mask(domain__, mask__, dev_priv__) \ + for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \ + (domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \ + (domain__)++) \ + for_each_if ((mask__) & (domain__)->mask) + +#define for_each_fw_domain(domain__, dev_priv__) \ + for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__) #define CSR_VERSION(major, minor) ((major) << 16 | (minor)) #define CSR_VERSION_MAJOR(version) ((version) >> 16) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 76ac990de354..49edd641b434 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -111,9 +111,8 @@ static void fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) { + for_each_fw_domain_mask(d, fw_domains, dev_priv) { fw_domain_wait_ack_clear(d); fw_domain_get(d); fw_domain_wait_ack(d); @@ -124,9 +123,8 @@ static void fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) { + for_each_fw_domain_mask(d, fw_domains, dev_priv) { fw_domain_put(d); fw_domain_posting_read(d); } @@ -136,10 +134,9 @@ static void fw_domains_posting_read(struct drm_i915_private *dev_priv) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; /* No need to do for all, just do for first found */ - for_each_fw_domain(d, dev_priv, id) { + for_each_fw_domain(d, dev_priv) { fw_domain_posting_read(d); break; } @@ -149,12 +146,11 @@ static void fw_domains_reset(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; if (dev_priv->uncore.fw_domains == 0) return; - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) + for_each_fw_domain_mask(d, fw_domains, dev_priv) fw_domain_reset(d); fw_domains_posting_read(dev_priv); @@ -256,7 +252,6 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) unsigned long irqflags; struct intel_uncore_forcewake_domain *domain; int retry_count = 100; - enum forcewake_domain_id id; enum forcewake_domains fw = 0, active_domains; /* Hold uncore.lock across reset to prevent any register access @@ -266,7 +261,7 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) while (1) { active_domains = 0; - for_each_fw_domain(domain, dev_priv, id) { + for_each_fw_domain(domain, dev_priv) { if (hrtimer_cancel(&domain->timer) == 0) continue; @@ -275,9 +270,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); - for_each_fw_domain(domain, dev_priv, id) { + for_each_fw_domain(domain, dev_priv) { if (hrtimer_active(&domain->timer)) - active_domains |= (1 << id); + active_domains |= domain->mask; } if (active_domains == 0) @@ -294,9 +289,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) WARN_ON(active_domains); - for_each_fw_domain(domain, dev_priv, id) + for_each_fw_domain(domain, dev_priv) if (domain->wake_count) - fw |= 1 << id; + fw |= domain->mask; if (fw) dev_priv->uncore.funcs.force_wake_put(dev_priv, fw); @@ -418,16 +413,15 @@ static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (!dev_priv->uncore.funcs.force_wake_get) return; fw_domains &= dev_priv->uncore.fw_domains; - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { + for_each_fw_domain_mask(domain, fw_domains, dev_priv) { if (domain->wake_count++) - fw_domains &= ~(1 << id); + fw_domains &= ~domain->mask; } if (fw_domains) @@ -485,14 +479,13 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (!dev_priv->uncore.funcs.force_wake_put) return; fw_domains &= dev_priv->uncore.fw_domains; - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { + for_each_fw_domain_mask(domain, fw_domains, dev_priv) { if (WARN_ON(domain->wake_count == 0)) continue; @@ -546,12 +539,11 @@ void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv, void assert_forcewakes_inactive(struct drm_i915_private *dev_priv) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (!dev_priv->uncore.funcs.force_wake_get) return; - for_each_fw_domain(domain, dev_priv, id) + for_each_fw_domain(domain, dev_priv) WARN_ON(domain->wake_count); } @@ -727,15 +719,14 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (WARN_ON(!fw_domains)) return; /* Ideally GCC would be constant-fold and eliminate this loop */ - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { + for_each_fw_domain_mask(domain, fw_domains, dev_priv) { if (domain->wake_count) { - fw_domains &= ~(1 << id); + fw_domains &= ~domain->mask; continue; } @@ -1156,6 +1147,12 @@ static void fw_domain_init(struct drm_i915_private *dev_priv, d->i915 = dev_priv; d->id = domain_id; + BUILD_BUG_ON(FORCEWAKE_RENDER != (1 << FW_DOMAIN_ID_RENDER)); + BUILD_BUG_ON(FORCEWAKE_BLITTER != (1 << FW_DOMAIN_ID_BLITTER)); + BUILD_BUG_ON(FORCEWAKE_MEDIA != (1 << FW_DOMAIN_ID_MEDIA)); + + d->mask = 1 << domain_id; + hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); d->timer.function = intel_uncore_fw_release_timer; -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators 2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin @ 2016-04-04 19:00 ` Chris Wilson 2016-04-04 19:14 ` Dave Gordon 1 sibling, 0 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-04 19:00 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Mon, Apr 04, 2016 at 05:51:10PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > As the vast majority of users does not use the domain id variable, > we can eliminate it from the iterator and also change the latter > using the same principle as was recently done for for_each_engine. > > For a couple of callers which do need the domain id, or the domain > mask, store the later in the domain itself at init time and use > both through it. > > Result is clearer code and smaller generated binary, especially > in the tight fw get/put loops. Also, relationship between domain > id and mask is no longer assumed in the macro. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Hah, I made exactly (admittedly I didn't think of adding BUILD_BUG_ON) the same patch just as you went off on holiday and weren't around to comment! Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators 2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin 2016-04-04 19:00 ` Chris Wilson @ 2016-04-04 19:14 ` Dave Gordon 2016-04-05 9:05 ` Tvrtko Ursulin 1 sibling, 1 reply; 23+ messages in thread From: Dave Gordon @ 2016-04-04 19:14 UTC (permalink / raw) To: Tvrtko Ursulin, Intel-gfx On 04/04/16 17:51, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > As the vast majority of users does not use the domain id variable, "do not use" > we can eliminate it from the iterator and also change the latter > using the same principle as was recently done for for_each_engine. > > For a couple of callers which do need the domain id, or the domain > mask, store the later in the domain itself at init time and use > both through it. "For a couple of callers which do need the domain mask, store it in the domain array (which already has the domain id), then both can be retrieved thence." ? > Result is clearer code and smaller generated binary, especially > in the tight fw get/put loops. Also, relationship between domain > id and mask is no longer assumed in the macro. "in the macro or elsewhere" ? > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> A few typos & clarifications above, plus a minor quibble about a macro name (which you haven't actually changed, but might as well fix). Otherwise LGTM, so Reviewed-by: Dave Gordon <david.s.gordon@intel.com> (even if you don't change the macro name) > --- > drivers/gpu/drm/i915/i915_debugfs.c | 5 ++--- > drivers/gpu/drm/i915/i915_drv.h | 17 +++++++------- > drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++++++++-------------------- > 3 files changed, 32 insertions(+), 35 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index a2e3af02c292..06fce014d0b4 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct seq_file *m, void *data) > struct drm_device *dev = node->minor->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > struct intel_uncore_forcewake_domain *fw_domain; > - int i; > > spin_lock_irq(&dev_priv->uncore.lock); > - for_each_fw_domain(fw_domain, dev_priv, i) { > + for_each_fw_domain(fw_domain, dev_priv) { > seq_printf(m, "%s.wake_count = %u\n", > - intel_uncore_forcewake_domain_to_str(i), > + intel_uncore_forcewake_domain_to_str(fw_domain->id), > fw_domain->wake_count); > } > spin_unlock_irq(&dev_priv->uncore.lock); > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 7d4c704d7d75..160f980f0368 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -665,6 +665,7 @@ struct intel_uncore { > struct intel_uncore_forcewake_domain { > struct drm_i915_private *i915; > enum forcewake_domain_id id; > + enum forcewake_domains mask; At present this mask will always have only one bit set, but I suppose there might be some utility in allowing multiple bits (virtual domains?) > unsigned wake_count; > struct hrtimer timer; > i915_reg_t reg_set; > @@ -679,14 +680,14 @@ struct intel_uncore { > }; > > /* Iterate over initialised fw domains */ > -#define for_each_fw_domain_mask(domain__, mask__, dev_priv__, i__) \ > - for ((i__) = 0, (domain__) = &(dev_priv__)->uncore.fw_domain[0]; \ > - (i__) < FW_DOMAIN_ID_COUNT; \ > - (i__)++, (domain__) = &(dev_priv__)->uncore.fw_domain[i__]) \ > - for_each_if (((mask__) & (dev_priv__)->uncore.fw_domains) & (1 << (i__))) > - > -#define for_each_fw_domain(domain__, dev_priv__, i__) \ > - for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__, i__) > +#define for_each_fw_domain_mask(domain__, mask__, dev_priv__) \ > + for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \ > + (domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \ > + (domain__)++) \ > + for_each_if ((mask__) & (domain__)->mask) > + For consistency with the for_each_engine() macros, the name ought to be "for_each_fw_domain_mask*ed*()", showing that we're iterating over a subset selected by the 'mask' parameter, rather than iterating over all possible masks. .Dave. > +#define for_each_fw_domain(domain__, dev_priv__) \ > + for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__) > > #define CSR_VERSION(major, minor) ((major) << 16 | (minor)) > #define CSR_VERSION_MAJOR(version) ((version) >> 16) > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 76ac990de354..49edd641b434 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -111,9 +111,8 @@ static void > fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) > { > struct intel_uncore_forcewake_domain *d; > - enum forcewake_domain_id id; > > - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) { > + for_each_fw_domain_mask(d, fw_domains, dev_priv) { > fw_domain_wait_ack_clear(d); > fw_domain_get(d); > fw_domain_wait_ack(d); > @@ -124,9 +123,8 @@ static void > fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) > { > struct intel_uncore_forcewake_domain *d; > - enum forcewake_domain_id id; > > - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) { > + for_each_fw_domain_mask(d, fw_domains, dev_priv) { > fw_domain_put(d); > fw_domain_posting_read(d); > } > @@ -136,10 +134,9 @@ static void > fw_domains_posting_read(struct drm_i915_private *dev_priv) > { > struct intel_uncore_forcewake_domain *d; > - enum forcewake_domain_id id; > > /* No need to do for all, just do for first found */ > - for_each_fw_domain(d, dev_priv, id) { > + for_each_fw_domain(d, dev_priv) { > fw_domain_posting_read(d); > break; > } > @@ -149,12 +146,11 @@ static void > fw_domains_reset(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) > { > struct intel_uncore_forcewake_domain *d; > - enum forcewake_domain_id id; > > if (dev_priv->uncore.fw_domains == 0) > return; > > - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) > + for_each_fw_domain_mask(d, fw_domains, dev_priv) > fw_domain_reset(d); > > fw_domains_posting_read(dev_priv); > @@ -256,7 +252,6 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) > unsigned long irqflags; > struct intel_uncore_forcewake_domain *domain; > int retry_count = 100; > - enum forcewake_domain_id id; > enum forcewake_domains fw = 0, active_domains; > > /* Hold uncore.lock across reset to prevent any register access > @@ -266,7 +261,7 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) > while (1) { > active_domains = 0; > > - for_each_fw_domain(domain, dev_priv, id) { > + for_each_fw_domain(domain, dev_priv) { > if (hrtimer_cancel(&domain->timer) == 0) > continue; > > @@ -275,9 +270,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) > > spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); > > - for_each_fw_domain(domain, dev_priv, id) { > + for_each_fw_domain(domain, dev_priv) { > if (hrtimer_active(&domain->timer)) > - active_domains |= (1 << id); > + active_domains |= domain->mask; > } > > if (active_domains == 0) > @@ -294,9 +289,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) > > WARN_ON(active_domains); > > - for_each_fw_domain(domain, dev_priv, id) > + for_each_fw_domain(domain, dev_priv) > if (domain->wake_count) > - fw |= 1 << id; > + fw |= domain->mask; > > if (fw) > dev_priv->uncore.funcs.force_wake_put(dev_priv, fw); > @@ -418,16 +413,15 @@ static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv, > enum forcewake_domains fw_domains) > { > struct intel_uncore_forcewake_domain *domain; > - enum forcewake_domain_id id; > > if (!dev_priv->uncore.funcs.force_wake_get) > return; > > fw_domains &= dev_priv->uncore.fw_domains; > > - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { > + for_each_fw_domain_mask(domain, fw_domains, dev_priv) { > if (domain->wake_count++) > - fw_domains &= ~(1 << id); > + fw_domains &= ~domain->mask; > } > > if (fw_domains) > @@ -485,14 +479,13 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv, > enum forcewake_domains fw_domains) > { > struct intel_uncore_forcewake_domain *domain; > - enum forcewake_domain_id id; > > if (!dev_priv->uncore.funcs.force_wake_put) > return; > > fw_domains &= dev_priv->uncore.fw_domains; > > - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { > + for_each_fw_domain_mask(domain, fw_domains, dev_priv) { > if (WARN_ON(domain->wake_count == 0)) > continue; > > @@ -546,12 +539,11 @@ void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv, > void assert_forcewakes_inactive(struct drm_i915_private *dev_priv) > { > struct intel_uncore_forcewake_domain *domain; > - enum forcewake_domain_id id; > > if (!dev_priv->uncore.funcs.force_wake_get) > return; > > - for_each_fw_domain(domain, dev_priv, id) > + for_each_fw_domain(domain, dev_priv) > WARN_ON(domain->wake_count); > } > > @@ -727,15 +719,14 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv, > enum forcewake_domains fw_domains) > { > struct intel_uncore_forcewake_domain *domain; > - enum forcewake_domain_id id; > > if (WARN_ON(!fw_domains)) > return; > > /* Ideally GCC would be constant-fold and eliminate this loop */ > - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { > + for_each_fw_domain_mask(domain, fw_domains, dev_priv) { > if (domain->wake_count) { > - fw_domains &= ~(1 << id); > + fw_domains &= ~domain->mask; > continue; > } > > @@ -1156,6 +1147,12 @@ static void fw_domain_init(struct drm_i915_private *dev_priv, > d->i915 = dev_priv; > d->id = domain_id; > > + BUILD_BUG_ON(FORCEWAKE_RENDER != (1 << FW_DOMAIN_ID_RENDER)); > + BUILD_BUG_ON(FORCEWAKE_BLITTER != (1 << FW_DOMAIN_ID_BLITTER)); > + BUILD_BUG_ON(FORCEWAKE_MEDIA != (1 << FW_DOMAIN_ID_MEDIA)); > + > + d->mask = 1 << domain_id; > + > hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > d->timer.function = intel_uncore_fw_release_timer; > > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators 2016-04-04 19:14 ` Dave Gordon @ 2016-04-05 9:05 ` Tvrtko Ursulin 2016-04-05 9:36 ` Chris Wilson 0 siblings, 1 reply; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 9:05 UTC (permalink / raw) To: Dave Gordon, Intel-gfx On 04/04/16 20:14, Dave Gordon wrote: > On 04/04/16 17:51, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> As the vast majority of users does not use the domain id variable, > > "do not use" Yep. > >> we can eliminate it from the iterator and also change the latter >> using the same principle as was recently done for for_each_engine. >> >> For a couple of callers which do need the domain id, or the domain >> mask, store the later in the domain itself at init time and use >> both through it. > > "For a couple of callers which do need the domain mask, store it > in the domain array (which already has the domain id), then both > can be retrieved thence." ? Better yes. >> Result is clearer code and smaller generated binary, especially >> in the tight fw get/put loops. Also, relationship between domain >> id and mask is no longer assumed in the macro. > > "in the macro or elsewhere" ? Just in the macro, it is still hardcoded in fw_domain_init. >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > A few typos & clarifications above, plus a minor quibble about a macro > name (which you haven't actually changed, but might as well fix). > Otherwise LGTM, so > > Reviewed-by: Dave Gordon <david.s.gordon@intel.com> > > (even if you don't change the macro name) Thanks, consistency sounds good so I will change it. >> --- >> drivers/gpu/drm/i915/i915_debugfs.c | 5 ++--- >> drivers/gpu/drm/i915/i915_drv.h | 17 +++++++------- >> drivers/gpu/drm/i915/intel_uncore.c | 45 >> +++++++++++++++++-------------------- >> 3 files changed, 32 insertions(+), 35 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c >> b/drivers/gpu/drm/i915/i915_debugfs.c >> index a2e3af02c292..06fce014d0b4 100644 >> --- a/drivers/gpu/drm/i915/i915_debugfs.c >> +++ b/drivers/gpu/drm/i915/i915_debugfs.c >> @@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct >> seq_file *m, void *data) >> struct drm_device *dev = node->minor->dev; >> struct drm_i915_private *dev_priv = dev->dev_private; >> struct intel_uncore_forcewake_domain *fw_domain; >> - int i; >> >> spin_lock_irq(&dev_priv->uncore.lock); >> - for_each_fw_domain(fw_domain, dev_priv, i) { >> + for_each_fw_domain(fw_domain, dev_priv) { >> seq_printf(m, "%s.wake_count = %u\n", >> - intel_uncore_forcewake_domain_to_str(i), >> + intel_uncore_forcewake_domain_to_str(fw_domain->id), >> fw_domain->wake_count); >> } >> spin_unlock_irq(&dev_priv->uncore.lock); >> diff --git a/drivers/gpu/drm/i915/i915_drv.h >> b/drivers/gpu/drm/i915/i915_drv.h >> index 7d4c704d7d75..160f980f0368 100644 >> --- a/drivers/gpu/drm/i915/i915_drv.h >> +++ b/drivers/gpu/drm/i915/i915_drv.h >> @@ -665,6 +665,7 @@ struct intel_uncore { >> struct intel_uncore_forcewake_domain { >> struct drm_i915_private *i915; >> enum forcewake_domain_id id; >> + enum forcewake_domains mask; > > At present this mask will always have only one bit set, but I suppose > there might be some utility in allowing multiple bits (virtual domains?) I did not like the name mask myself but couldn't think of anything better. Do you have a suggestion? Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators 2016-04-05 9:05 ` Tvrtko Ursulin @ 2016-04-05 9:36 ` Chris Wilson 0 siblings, 0 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-05 9:36 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Tue, Apr 05, 2016 at 10:05:24AM +0100, Tvrtko Ursulin wrote: > On 04/04/16 20:14, Dave Gordon wrote: > >On 04/04/16 17:51, Tvrtko Ursulin wrote: > >>diff --git a/drivers/gpu/drm/i915/i915_drv.h > >>b/drivers/gpu/drm/i915/i915_drv.h > >>index 7d4c704d7d75..160f980f0368 100644 > >>--- a/drivers/gpu/drm/i915/i915_drv.h > >>+++ b/drivers/gpu/drm/i915/i915_drv.h > >>@@ -665,6 +665,7 @@ struct intel_uncore { > >> struct intel_uncore_forcewake_domain { > >> struct drm_i915_private *i915; > >> enum forcewake_domain_id id; > >>+ enum forcewake_domains mask; > > > >At present this mask will always have only one bit set, but I suppose > >there might be some utility in allowing multiple bits (virtual domains?) > > I did not like the name mask myself but couldn't think of anything > better. Do you have a suggestion? mask is fine as we use it as a mask. We've called it intel_engine_flag() which is worse imo. If there is a much better name for this, we should also fixup that name as well. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin 2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin @ 2016-04-04 16:51 ` Tvrtko Ursulin 2016-04-04 19:07 ` Chris Wilson 2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson ` (3 subsequent siblings) 5 siblings, 1 reply; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-04 16:51 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> On platforms with multiple forcewake domains it seems more efficient to request all desired ones and then to wait for acks to avoid needlessly serializing on each domain. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/intel_uncore.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 49edd641b434..03674c3cfaf7 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -115,8 +115,10 @@ fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma for_each_fw_domain_mask(d, fw_domains, dev_priv) { fw_domain_wait_ack_clear(d); fw_domain_get(d); - fw_domain_wait_ack(d); } + + for_each_fw_domain_mask(d, fw_domains, dev_priv) + fw_domain_wait_ack(d); } static void -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains 2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin @ 2016-04-04 19:07 ` Chris Wilson 2016-04-05 9:02 ` Tvrtko Ursulin 0 siblings, 1 reply; 23+ messages in thread From: Chris Wilson @ 2016-04-04 19:07 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > On platforms with multiple forcewake domains it seems more efficient > to request all desired ones and then to wait for acks to avoid > needlessly serializing on each domain. Not convinced since we have more machines with one domain than two. What I did was to compact the domains array so that we only iterated over the known set - but that feels overkill when we only have two domains today. For the same reason (only one machine with two domains), I didn't think seperate functions to iterate over one domain and another to iterate over all was worth it. What you can do though is remove an excess posting read from fw_domains_put. Compared to the cost of a register access (the spinlock irq mostly) the iterator doesn't strike me as being that worthwhile an optimisation target. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains 2016-04-04 19:07 ` Chris Wilson @ 2016-04-05 9:02 ` Tvrtko Ursulin 2016-04-05 9:47 ` Chris Wilson 0 siblings, 1 reply; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 9:02 UTC (permalink / raw) To: Chris Wilson, Intel-gfx On 04/04/16 20:07, Chris Wilson wrote: > On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> On platforms with multiple forcewake domains it seems more efficient >> to request all desired ones and then to wait for acks to avoid >> needlessly serializing on each domain. > > Not convinced since we have more machines with one domain than two. What > I did was to compact the domains array so that we only iterated over the > known set - but that feels overkill when we only have two domains today. > > For the same reason (only one machine with two domains), I didn't think > seperate functions to iterate over one domain and another to iterate > over all was worth it. > > What you can do though is remove an excess posting read from > fw_domains_put. > > Compared to the cost of a register access (the spinlock irq mostly) the > iterator doesn't strike me as being that worthwhile an optimisation > target. Correct, I thought we agreed that the majority of the CPU time attributed to fw_domains_get is from the busy spinning while waiting on the ack from the GPU. This patch is not optimising the iterator, but requests all domains to be woken up and then waits for acks. It changes the time spent busy spinning from Td1 + ... + Td2 to max(Td1...Tdn). Yes it is only interesting for platforms with more than one fw domain. But since we agreed iterator is not significant, the fact that it adds two loops* over the array should not be noticeable vs. the gain for multi-fw domain machines (which will be more and more of as time goes by). Regards, Tvrtko * Also because 2/3 from this serious has shrunk the iterator considerably, even with two iterations fw_domains_get remains pretty much the same size now with two loops, vs one loop before it. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains 2016-04-05 9:02 ` Tvrtko Ursulin @ 2016-04-05 9:47 ` Chris Wilson 0 siblings, 0 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-05 9:47 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Tue, Apr 05, 2016 at 10:02:28AM +0100, Tvrtko Ursulin wrote: > > On 04/04/16 20:07, Chris Wilson wrote: > >On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >>On platforms with multiple forcewake domains it seems more efficient > >>to request all desired ones and then to wait for acks to avoid > >>needlessly serializing on each domain. > > > >Not convinced since we have more machines with one domain than two. What > >I did was to compact the domains array so that we only iterated over the > >known set - but that feels overkill when we only have two domains today. > > > >For the same reason (only one machine with two domains), I didn't think > >seperate functions to iterate over one domain and another to iterate > >over all was worth it. > > > >What you can do though is remove an excess posting read from > >fw_domains_put. > > > >Compared to the cost of a register access (the spinlock irq mostly) the > >iterator doesn't strike me as being that worthwhile an optimisation > >target. > > Correct, I thought we agreed that the majority of the CPU time > attributed to fw_domains_get is from the busy spinning while waiting > on the ack from the GPU. > > This patch is not optimising the iterator, but requests all domains > to be woken up and then waits for acks. It changes the time spent > busy spinning from Td1 + ... + Td2 to max(Td1...Tdn). > > Yes it is only interesting for platforms with more than one fw > domain. But since we agreed iterator is not significant, the fact > that it adds two loops* over the array should not be noticeable vs. > the gain for multi-fw domain machines (which will be more and more > of as time goes by). Then we should be first looking at the cases where we are requesting multiple domains to be woken where we only need one. Anyway you've changed my mind, Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin 2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin 2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin @ 2016-04-04 18:58 ` Chris Wilson 2016-04-04 19:41 ` Dave Gordon 2016-04-05 8:54 ` Tvrtko Ursulin 2016-04-04 18:58 ` Dave Gordon ` (2 subsequent siblings) 5 siblings, 2 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-04 18:58 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Current implementation releases the forcewake at any time between > straight away, and one jiffie from the last put, or first automatic > grab. That isn't the problem though. The problem is that we set the timer on first use rather than last use. All you are stating here is that by lengthening the timeout on your system you reduce the number of times it times out. Having said that, the conversion to hrtimer seems sensible but to add tracking of the last access as opposed to first we either fallback to jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being fast enough for every register write. Hmm, my usual response to that has been if it matters we avoid the heavyweight macros and use the _FW interface - or even raw readl/writel. Could you try storing ktime_get_raw() on every access and rearming the timer if it expires before last-access + min-period? -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson @ 2016-04-04 19:41 ` Dave Gordon 2016-04-04 20:22 ` Chris Wilson 2016-04-05 8:54 ` Tvrtko Ursulin 1 sibling, 1 reply; 23+ messages in thread From: Dave Gordon @ 2016-04-04 19:41 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On 04/04/16 19:58, Chris Wilson wrote: > On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Current implementation releases the forcewake at any time between >> straight away, and one jiffie from the last put, or first automatic >> grab. > > That isn't the problem though. The problem is that we set the timer on > first use rather than last use. All you are stating here is that by > lengthening the timeout on your system you reduce the number of times it > times out. I thought that was already done, per my comments on one of your recent patches (rename __force_wake_get to __force_wake_auto) and your reply (gains are probably worth extra complexity). But of course Tvrtko wasn't here the last few days so may not have seen that. I suppose it would make sense to fold both changes together. But changing the timeout to 1ms makes it generally *shorter* than before, not longer. The gain can't just be from having a longer timeout, 'cos it isn't. (Average timeout now 1.5ms vs. previous 8ms?) I think it should come from the fact that accesses are likely to be bursty, having a bimodal distribution in the time domain. When we've made one access, we're likely to make another much sooner than 1ms later, but once we stop making accesses there will probably be far more than 1ms before the next set of accesses. > Having said that, the conversion to hrtimer seems sensible but to add > tracking of the last access as opposed to first we either fallback to > jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being > fast enough for every register write. Hmm, my usual response to that has > been if it matters we avoid the heavyweight macros and use the _FW > interface - or even raw readl/writel. > > Could you try storing ktime_get_raw() on every access and rearming the > timer if it expires before last-access + min-period? > -Chris One more idea - do we want two different expiry periods depending on whether the caller used the _auto version or not? If they did, they're probably anticipating "only a few" accesses in the current operation, but perhaps with no idea about when the next set of accesses will occur. If not, then the final decrement means they think they've finished for a while, and perhaps don't expect to come back for quite a long time. In that situation the timeout only helps avoid rapid off/on cycles by unrelated functions, not between logically-related operations. So would two different timeouts make sense? Or even ... _auto: longish timeout from the *first* _get() - don't rearm thereafter other: shortish timeout from the _put() - non-auto _get() will cancel or is that just too complicated? .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 19:41 ` Dave Gordon @ 2016-04-04 20:22 ` Chris Wilson 0 siblings, 0 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-04 20:22 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Mon, Apr 04, 2016 at 08:41:20PM +0100, Dave Gordon wrote: > On 04/04/16 19:58, Chris Wilson wrote: > >On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >>Current implementation releases the forcewake at any time between > >>straight away, and one jiffie from the last put, or first automatic > >>grab. > > > >That isn't the problem though. The problem is that we set the timer on > >first use rather than last use. All you are stating here is that by > >lengthening the timeout on your system you reduce the number of times it > >times out. > > I thought that was already done, per my comments on one of your > recent patches (rename __force_wake_get to __force_wake_auto) and > your reply (gains are probably worth extra complexity). But of > course Tvrtko wasn't here the last few days so may not have seen > that. I suppose it would make sense to fold both changes together. We were talking about the cost of having 2 passes vs 1 to do the same task. Before Tvrtko left, I realised again that we didn't defer the timer itself for every access, rearming a timer on every register access seems costly (more so switching to hrtimer). Rearming the timer inside the timer callback seems a reasonable compromise. > But changing the timeout to 1ms makes it generally *shorter* than > before, not longer. The gain can't just be from having a longer > timeout, 'cos it isn't. (Average timeout now 1.5ms vs. previous > 8ms?) It is. jiffies+1 does not equal an 8ms delay (it would be 1ms on his machine anways, 10ms on mine). The issue is that +1 jiffie is up to 1 jiffie and often close to 0, and that very well explains the wakeups Tvrtko measured. > I think it should come from the fact that accesses are likely to be > bursty, having a bimodal distribution in the time domain. When we've > made one access, we're likely to make another much sooner than 1ms > later, but once we stop making accesses there will probably be far > more than 1ms before the next set of accesses. Yes. We expect temporal coherence in the access patten. That's assuredly so since we very rarely access one register by itself. > One more idea - do we want two different expiry periods depending on > whether the caller used the _auto version or not? Hard to tell. Broadly speaking we have two users of the non-auto interface: userspace disabling forcewake indefinitely, and the other is code that wants to hold the spinlock + forcewake over a bunch of register writes. > If they did, they're probably anticipating "only a few" accesses in > the current operation, but perhaps with no idea about when the next > set of accesses will occur. > > If not, then the final decrement means they think they've finished > for a while, and perhaps don't expect to come back for quite a long > time. In that situation the timeout only helps avoid rapid off/on > cycles by unrelated functions, not between logically-related > operations. Execlists. (Wants to hold forcewake over a bunch of reads and writes, and may be very frequent context switches or very slow) > So would two different timeouts make sense? Or even ... > > _auto: longish timeout from the *first* _get() - don't rearm thereafter > other: shortish timeout from the _put() - non-auto _get() will cancel > > or is that just too complicated? The challenge is that I think we should be pushing for shorter timeouts, because of the bimodality in many accesses followed by a long period of sleep. One of the original reasons for the choice in 1 jiffie with a pinned timer was that it should mean that on the next call to schedule() we should be deasserting forcewake (i.e. we only hold forcewake for this sequence of register accesses). However, execlists scuppers that by having an interrupt context that requires forcewake, and frequently so. So I think we do need a combination of something like task_work_run() to tie a forcewake reference to the active task with something like an extra timer to allow frequent wakers to avoid having to retake forcewake. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson 2016-04-04 19:41 ` Dave Gordon @ 2016-04-05 8:54 ` Tvrtko Ursulin 2016-04-05 8:59 ` Chris Wilson 1 sibling, 1 reply; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 8:54 UTC (permalink / raw) To: Chris Wilson, Intel-gfx On 04/04/16 19:58, Chris Wilson wrote: > On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Current implementation releases the forcewake at any time between >> straight away, and one jiffie from the last put, or first automatic >> grab. > > That isn't the problem though. The problem is that we set the timer on > first use rather than last use. All you are stating here is that by > lengthening the timeout on your system you reduce the number of times it > times out. It is true the reduction I see is due lengthening of the average timeout. But with regards to re-arming approach, I thought so initially myself, but then, if we expect bursty access then it shouldn't matter and the simplicity of doing it like it currently is better. Even in practice, I noticed the effect of lengthening the timeout is much greater than moving the arming to the last access. And to get to very few to none auto releases on busy workloads we need in the regions of 5ms, which would be a big change. Or maybe not if you consider HZ=100 kernels. It is very difficult to know what is actually better considering power between the CPU and GPU and performance. So I though the best would be to keep it similar to the current timings, just fix the dependency on HZ and also slack with hrtimers might help with something. As a final data point, explicit puts and auto releases seems to be relatively balanced in my testing. With this patch T-Rex for example auto-releases in the region of 3-4 times in 10ms, with around 5-10 explicit gets, and 5-10 implicit gets in 10ms. A different, interrupt heavy workload (~20k irqs/sec) manages similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and ~380 implicit per 10ms. Looking at the two I think the fact they manage to auto-release relatively similarly, compared to a huge different in fw gets, suggest burstyness. So I am not sure that any smarts with the release period would be interesting. At least not without serious power/perf measurements. > Having said that, the conversion to hrtimer seems sensible but to add > tracking of the last access as opposed to first we either fallback to > jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being > fast enough for every register write. Hmm, my usual response to that has > been if it matters we avoid the heavyweight macros and use the _FW > interface - or even raw readl/writel. > > Could you try storing ktime_get_raw() on every access and rearming the > timer if it expires before last-access + min-period? That would be similar to your patch from before my holiday, right? As I said above, I did not notice much change with that approach. Just extending the timeout has a much greater effect, but as I wrote above, I am not sure we want to change it. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-05 8:54 ` Tvrtko Ursulin @ 2016-04-05 8:59 ` Chris Wilson 2016-04-05 10:02 ` Tvrtko Ursulin 0 siblings, 1 reply; 23+ messages in thread From: Chris Wilson @ 2016-04-05 8:59 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote: > > On 04/04/16 19:58, Chris Wilson wrote: > >On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >>Current implementation releases the forcewake at any time between > >>straight away, and one jiffie from the last put, or first automatic > >>grab. > > > >That isn't the problem though. The problem is that we set the timer on > >first use rather than last use. All you are stating here is that by > >lengthening the timeout on your system you reduce the number of times it > >times out. > > It is true the reduction I see is due lengthening of the average timeout. > > But with regards to re-arming approach, I thought so initially > myself, but then, if we expect bursty access then it shouldn't > matter and the simplicity of doing it like it currently is better. > > Even in practice, I noticed the effect of lengthening the timeout is > much greater than moving the arming to the last access. And to get > to very few to none auto releases on busy workloads we need in the > regions of 5ms, which would be a big change. Or maybe not if you > consider HZ=100 kernels. > > It is very difficult to know what is actually better considering > power between the CPU and GPU and performance. So I though the best > would be to keep it similar to the current timings, just fix the > dependency on HZ and also slack with hrtimers might help with > something. > > As a final data point, explicit puts and auto releases seems to be > relatively balanced in my testing. With this patch T-Rex for example > auto-releases in the region of 3-4 times in 10ms, with around 5-10 > explicit gets, and 5-10 implicit gets in 10ms. > > A different, interrupt heavy workload (~20k irqs/sec) manages > similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and > ~380 implicit per 10ms. > > Looking at the two I think the fact they manage to auto-release > relatively similarly, compared to a huge different in fw gets, > suggest burstyness. So I am not sure that any smarts with the > release period would be interesting. At least not without serious > power/perf measurements. > > >Having said that, the conversion to hrtimer seems sensible but to add > >tracking of the last access as opposed to first we either fallback to > >jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being > >fast enough for every register write. Hmm, my usual response to that has > >been if it matters we avoid the heavyweight macros and use the _FW > >interface - or even raw readl/writel. > > > >Could you try storing ktime_get_raw() on every access and rearming the > >timer if it expires before last-access + min-period? > > That would be similar to your patch from before my holiday, right? > As I said above, I did not notice much change with that approach. > Just extending the timeout has a much greater effect, but as I wrote > above, I am not sure we want to change it. There was just one bug in that patch in checking the expiration that makes a huge difference, but if you please add the discussion above to the commit log that would be invaluable. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-05 8:59 ` Chris Wilson @ 2016-04-05 10:02 ` Tvrtko Ursulin 2016-04-05 10:44 ` Chris Wilson 0 siblings, 1 reply; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 10:02 UTC (permalink / raw) To: Chris Wilson, Intel-gfx On 05/04/16 09:59, Chris Wilson wrote: > On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote: >> >> On 04/04/16 19:58, Chris Wilson wrote: >>> On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Current implementation releases the forcewake at any time between >>>> straight away, and one jiffie from the last put, or first automatic >>>> grab. >>> >>> That isn't the problem though. The problem is that we set the timer on >>> first use rather than last use. All you are stating here is that by >>> lengthening the timeout on your system you reduce the number of times it >>> times out. >> >> It is true the reduction I see is due lengthening of the average timeout. >> >> But with regards to re-arming approach, I thought so initially >> myself, but then, if we expect bursty access then it shouldn't >> matter and the simplicity of doing it like it currently is better. >> >> Even in practice, I noticed the effect of lengthening the timeout is >> much greater than moving the arming to the last access. And to get >> to very few to none auto releases on busy workloads we need in the >> regions of 5ms, which would be a big change. Or maybe not if you >> consider HZ=100 kernels. >> >> It is very difficult to know what is actually better considering >> power between the CPU and GPU and performance. So I though the best >> would be to keep it similar to the current timings, just fix the >> dependency on HZ and also slack with hrtimers might help with >> something. >> >> As a final data point, explicit puts and auto releases seems to be >> relatively balanced in my testing. With this patch T-Rex for example >> auto-releases in the region of 3-4 times in 10ms, with around 5-10 >> explicit gets, and 5-10 implicit gets in 10ms. >> >> A different, interrupt heavy workload (~20k irqs/sec) manages >> similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and >> ~380 implicit per 10ms. >> >> Looking at the two I think the fact they manage to auto-release >> relatively similarly, compared to a huge different in fw gets, >> suggest burstyness. So I am not sure that any smarts with the >> release period would be interesting. At least not without serious >> power/perf measurements. >> >>> Having said that, the conversion to hrtimer seems sensible but to add >>> tracking of the last access as opposed to first we either fallback to >>> jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being >>> fast enough for every register write. Hmm, my usual response to that has >>> been if it matters we avoid the heavyweight macros and use the _FW >>> interface - or even raw readl/writel. >>> >>> Could you try storing ktime_get_raw() on every access and rearming the >>> timer if it expires before last-access + min-period? >> >> That would be similar to your patch from before my holiday, right? >> As I said above, I did not notice much change with that approach. >> Just extending the timeout has a much greater effect, but as I wrote >> above, I am not sure we want to change it. > > There was just one bug in that patch in checking the expiration that > makes a huge difference, but if you please add the discussion above to > the commit log that would be invaluable. I think I had a fixed version, would have to dig in branches now. I will add more text to the commit. Most importantly, do you think this would affect HZ=10 or HZ=250 kernels at all? Presumably if it could, someone would know today that there is a difference between the kernels based on HZ already? Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-05 10:02 ` Tvrtko Ursulin @ 2016-04-05 10:44 ` Chris Wilson 0 siblings, 0 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-05 10:44 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Tue, Apr 05, 2016 at 11:02:15AM +0100, Tvrtko Ursulin wrote: > > On 05/04/16 09:59, Chris Wilson wrote: > >On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote: > >> > >>On 04/04/16 19:58, Chris Wilson wrote: > >>>On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: > >>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>>> > >>>>Current implementation releases the forcewake at any time between > >>>>straight away, and one jiffie from the last put, or first automatic > >>>>grab. > >>> > >>>That isn't the problem though. The problem is that we set the timer on > >>>first use rather than last use. All you are stating here is that by > >>>lengthening the timeout on your system you reduce the number of times it > >>>times out. > >> > >>It is true the reduction I see is due lengthening of the average timeout. > >> > >>But with regards to re-arming approach, I thought so initially > >>myself, but then, if we expect bursty access then it shouldn't > >>matter and the simplicity of doing it like it currently is better. > >> > >>Even in practice, I noticed the effect of lengthening the timeout is > >>much greater than moving the arming to the last access. And to get > >>to very few to none auto releases on busy workloads we need in the > >>regions of 5ms, which would be a big change. Or maybe not if you > >>consider HZ=100 kernels. > >> > >>It is very difficult to know what is actually better considering > >>power between the CPU and GPU and performance. So I though the best > >>would be to keep it similar to the current timings, just fix the > >>dependency on HZ and also slack with hrtimers might help with > >>something. > >> > >>As a final data point, explicit puts and auto releases seems to be > >>relatively balanced in my testing. With this patch T-Rex for example > >>auto-releases in the region of 3-4 times in 10ms, with around 5-10 > >>explicit gets, and 5-10 implicit gets in 10ms. > >> > >>A different, interrupt heavy workload (~20k irqs/sec) manages > >>similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and > >>~380 implicit per 10ms. > >> > >>Looking at the two I think the fact they manage to auto-release > >>relatively similarly, compared to a huge different in fw gets, > >>suggest burstyness. So I am not sure that any smarts with the > >>release period would be interesting. At least not without serious > >>power/perf measurements. > >> > >>>Having said that, the conversion to hrtimer seems sensible but to add > >>>tracking of the last access as opposed to first we either fallback to > >>>jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being > >>>fast enough for every register write. Hmm, my usual response to that has > >>>been if it matters we avoid the heavyweight macros and use the _FW > >>>interface - or even raw readl/writel. > >>> > >>>Could you try storing ktime_get_raw() on every access and rearming the > >>>timer if it expires before last-access + min-period? > >> > >>That would be similar to your patch from before my holiday, right? > >>As I said above, I did not notice much change with that approach. > >>Just extending the timeout has a much greater effect, but as I wrote > >>above, I am not sure we want to change it. > > > >There was just one bug in that patch in checking the expiration that > >makes a huge difference, but if you please add the discussion above to > >the commit log that would be invaluable. > > I think I had a fixed version, would have to dig in branches now. > > I will add more text to the commit. > > Most importantly, do you think this would affect HZ=10 or HZ=250 > kernels at all? Presumably if it could, someone would know today > that there is a difference between the kernels based on HZ already? Hmm, I switch quite regularly - that may explain why sometimes I see fw_domains_get quite heavy in the profiles and other. But I have never correlated that with changing .configs. And I don't keep stats for rc6 during stress tests - especially more notable is the lack of light desktop load testing. One interesting side-effect of htrimer is that iirc it is installed on the local cpu and isn't run until we hit a preemption point (i.e. the scheduler in my case) so for those who are running high latency low HZ are likely unaffected at all! As far as hrtimer is concerned, my only real worry is that if it is too expensive - but since it is being increasingly widely used I shouldn't worry myself too much about that (if it were to be a problem, it will get fixed!). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin ` (2 preceding siblings ...) 2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson @ 2016-04-04 18:58 ` Dave Gordon 2016-04-05 6:26 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] " Patchwork 2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin 5 siblings, 0 replies; 23+ messages in thread From: Dave Gordon @ 2016-04-04 18:58 UTC (permalink / raw) To: Tvrtko Ursulin, Intel-gfx On 04/04/16 17:51, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Current implementation releases the forcewake at any time between > straight away, and one jiffie from the last put, or first automatic > grab. > > This does not sound like what was desired since jiffies are typically > between 1 and 10ms depending on the kernel configuration. > > Change the auto-release mechanism to use hrtimers and set the timeout > to 1ms with a 1ms of slack. This should make the GPU power consistent > across kernel configs and the slack should enable some timer coallescing "coalescing" > where multiple force-wake domains exist, or with unrelated timers. > > For GlBench/T-Rex this decreases the number of forcewake releases from > ~480 to ~300 per second, and for a heavy combined OGL/OCL test from > ~670 to ~360. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> LGTM. Reviewed-by: Dave Gordon <david.s.gordon@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 2 +- > drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++--------- > 2 files changed, 17 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index dd187727c813..7d4c704d7d75 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -666,7 +666,7 @@ struct intel_uncore { > struct drm_i915_private *i915; > enum forcewake_domain_id id; > unsigned wake_count; > - struct timer_list timer; > + struct hrtimer timer; > i915_reg_t reg_set; > u32 val_set; > u32 val_clear; > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index ac1c545436af..76ac990de354 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -60,7 +60,11 @@ fw_domain_reset(const struct intel_uncore_forcewake_domain *d) > static inline void > fw_domain_arm_timer(struct intel_uncore_forcewake_domain *d) > { > - mod_timer_pinned(&d->timer, jiffies + 1); > + d->wake_count++; > + hrtimer_start_range_ns(&d->timer, > + ktime_set(0, NSEC_PER_MSEC), > + NSEC_PER_MSEC, > + HRTIMER_MODE_REL); > } > > static inline void > @@ -224,9 +228,11 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv) > return ret; > } > > -static void intel_uncore_fw_release_timer(unsigned long arg) > +static enum hrtimer_restart > +intel_uncore_fw_release_timer(struct hrtimer *timer) > { > - struct intel_uncore_forcewake_domain *domain = (void *)arg; > + struct intel_uncore_forcewake_domain *domain = > + container_of(timer, struct intel_uncore_forcewake_domain, timer); > unsigned long irqflags; > > assert_rpm_device_not_suspended(domain->i915); > @@ -240,6 +246,8 @@ static void intel_uncore_fw_release_timer(unsigned long arg) > 1 << domain->id); > > spin_unlock_irqrestore(&domain->i915->uncore.lock, irqflags); > + > + return HRTIMER_NORESTART; > } > > void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) > @@ -259,16 +267,16 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) > active_domains = 0; > > for_each_fw_domain(domain, dev_priv, id) { > - if (del_timer_sync(&domain->timer) == 0) > + if (hrtimer_cancel(&domain->timer) == 0) > continue; > > - intel_uncore_fw_release_timer((unsigned long)domain); > + intel_uncore_fw_release_timer(&domain->timer); > } > > spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); > > for_each_fw_domain(domain, dev_priv, id) { > - if (timer_pending(&domain->timer)) > + if (hrtimer_active(&domain->timer)) > active_domains |= (1 << id); > } > > @@ -491,7 +499,6 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv, > if (--domain->wake_count) > continue; > > - domain->wake_count++; > fw_domain_arm_timer(domain); > } > } > @@ -732,7 +739,6 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv, > continue; > } > > - domain->wake_count++; > fw_domain_arm_timer(domain); > } > > @@ -1150,7 +1156,8 @@ static void fw_domain_init(struct drm_i915_private *dev_priv, > d->i915 = dev_priv; > d->id = domain_id; > > - setup_timer(&d->timer, intel_uncore_fw_release_timer, (unsigned long)d); > + hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > + d->timer.function = intel_uncore_fw_release_timer; > > dev_priv->uncore.fw_domains |= (1 << domain_id); > > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin ` (3 preceding siblings ...) 2016-04-04 18:58 ` Dave Gordon @ 2016-04-05 6:26 ` Patchwork 2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin 5 siblings, 0 replies; 23+ messages in thread From: Patchwork @ 2016-04-05 6:26 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs URL : https://patchwork.freedesktop.org/series/5282/ State : failure == Summary == Series 5282v1 Series without cover letter http://patchwork.freedesktop.org/api/1.0/series/5282/revisions/1/mbox/ Test drv_hangman: Subgroup error-state-basic: pass -> FAIL (ilk-hp8440p) Test kms_force_connector_basic: Subgroup prune-stale-modes: skip -> PASS (ilk-hp8440p) Test kms_pipe_crc_basic: Subgroup suspend-read-crc-pipe-b: incomplete -> PASS (hsw-gt2) bdw-nuci7 total:196 pass:184 dwarn:0 dfail:0 fail:0 skip:12 bdw-ultra total:196 pass:175 dwarn:0 dfail:0 fail:0 skip:21 bsw-nuc-2 total:196 pass:159 dwarn:0 dfail:0 fail:0 skip:37 byt-nuc total:196 pass:161 dwarn:0 dfail:0 fail:0 skip:35 hsw-brixbox total:196 pass:174 dwarn:0 dfail:0 fail:0 skip:22 hsw-gt2 total:196 pass:179 dwarn:0 dfail:0 fail:0 skip:17 ilk-hp8440p total:196 pass:131 dwarn:0 dfail:0 fail:1 skip:64 ivb-t430s total:196 pass:171 dwarn:0 dfail:0 fail:0 skip:25 skl-i7k-2 total:196 pass:173 dwarn:0 dfail:0 fail:0 skip:23 snb-dellxps total:196 pass:162 dwarn:0 dfail:0 fail:0 skip:34 snb-x220t total:196 pass:162 dwarn:0 dfail:0 fail:1 skip:33 Results at /archive/results/CI_IGT_test/Patchwork_1793/ aedfaaef290af9c8df7d9f4adf22cbe21704d091 drm-intel-nightly: 2016y-04m-04d-13h-09m-54s UTC integration manifest 51638b310bc8dcb6536bbb105efaa8c03eb54463 drm/i915: Do not serialize forcewake acquire across domains 5b17ed47669e2d3be296c473240e57db3d27adea drm/i915: Simplify for_each_fw_domain iterators a3e632b7f4a64695a3601c8b0ef4fbade564507c drm/i915: Use consistent forcewake auto-release timeout across kernel configs _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin ` (4 preceding siblings ...) 2016-04-05 6:26 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] " Patchwork @ 2016-04-05 11:01 ` Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin ` (2 more replies) 5 siblings, 3 replies; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 11:01 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Because it is based on jiffies, current implementation releases the forcewake at any time between straight away and between 1ms and 10ms, depending on the kernel configuration (CONFIG_HZ). This is probably not what has been desired, since the dynamics of keeping parts of the GPU awake should not be correlated with this kernel configuration parameter. Change the auto-release mechanism to use hrtimers and set the timeout to 1ms with a 1ms of slack. This should make the GPU power consistent across kernel configs, and timer slack should enable some timer coalescing where multiple force-wake domains exist, or with unrelated timers. For GlBench/T-Rex this decreases the number of forcewake releases from ~480 to ~300 per second, and for a heavy combined OGL/OCL test from ~670 to ~360 (HZ=1000 kernel). Even though this reduction can be attributed to the average release period extending from 0-1ms to 1-2ms, as discussed above, it will make the forcewake timeout consistent for different CONFIG_HZ values. Real life measurements with the above workload has shown that, with this patch, both manage to auto-release the forcewake between 2-4 times per 10ms, even though the number of forcewake gets is dramatically different. T-Rex requests between 5-10 explicit gets and 5-10 implict gets in each 10ms period, while the OGL/OCL test requests 250 and 380 times in the same period. The two data points together suggest that the nature of the forwake accesses is bursty and that further changes and potential timeout extensions, or moving the start of timeout from the first to the last automatic forcewake grab, should be carefully measured for power and performance effects. v2: * Commit spelling. (Dave Gordon) * More discussion on numbers in the commit. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++--------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index dd187727c813..7d4c704d7d75 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -666,7 +666,7 @@ struct intel_uncore { struct drm_i915_private *i915; enum forcewake_domain_id id; unsigned wake_count; - struct timer_list timer; + struct hrtimer timer; i915_reg_t reg_set; u32 val_set; u32 val_clear; diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index ac1c545436af..76ac990de354 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -60,7 +60,11 @@ fw_domain_reset(const struct intel_uncore_forcewake_domain *d) static inline void fw_domain_arm_timer(struct intel_uncore_forcewake_domain *d) { - mod_timer_pinned(&d->timer, jiffies + 1); + d->wake_count++; + hrtimer_start_range_ns(&d->timer, + ktime_set(0, NSEC_PER_MSEC), + NSEC_PER_MSEC, + HRTIMER_MODE_REL); } static inline void @@ -224,9 +228,11 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv) return ret; } -static void intel_uncore_fw_release_timer(unsigned long arg) +static enum hrtimer_restart +intel_uncore_fw_release_timer(struct hrtimer *timer) { - struct intel_uncore_forcewake_domain *domain = (void *)arg; + struct intel_uncore_forcewake_domain *domain = + container_of(timer, struct intel_uncore_forcewake_domain, timer); unsigned long irqflags; assert_rpm_device_not_suspended(domain->i915); @@ -240,6 +246,8 @@ static void intel_uncore_fw_release_timer(unsigned long arg) 1 << domain->id); spin_unlock_irqrestore(&domain->i915->uncore.lock, irqflags); + + return HRTIMER_NORESTART; } void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) @@ -259,16 +267,16 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) active_domains = 0; for_each_fw_domain(domain, dev_priv, id) { - if (del_timer_sync(&domain->timer) == 0) + if (hrtimer_cancel(&domain->timer) == 0) continue; - intel_uncore_fw_release_timer((unsigned long)domain); + intel_uncore_fw_release_timer(&domain->timer); } spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); for_each_fw_domain(domain, dev_priv, id) { - if (timer_pending(&domain->timer)) + if (hrtimer_active(&domain->timer)) active_domains |= (1 << id); } @@ -491,7 +499,6 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv, if (--domain->wake_count) continue; - domain->wake_count++; fw_domain_arm_timer(domain); } } @@ -732,7 +739,6 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv, continue; } - domain->wake_count++; fw_domain_arm_timer(domain); } @@ -1150,7 +1156,8 @@ static void fw_domain_init(struct drm_i915_private *dev_priv, d->i915 = dev_priv; d->id = domain_id; - setup_timer(&d->timer, intel_uncore_fw_release_timer, (unsigned long)d); + hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + d->timer.function = intel_uncore_fw_release_timer; dev_priv->uncore.fw_domains |= (1 << domain_id); -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators 2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin @ 2016-04-05 11:01 ` Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin 2016-04-05 11:22 ` [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson 2 siblings, 0 replies; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 11:01 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> As the vast majority of users do not use the domain id variable, we can eliminate it from the iterator and also change the latter using the same principle as was recently done for for_each_engine. For a couple of callers which do need the domain mask, store it in the domain array (which already has the domain id), then both can be retrieved thence. Result is clearer code and smaller generated binary, especially in the tight fw get/put loops. Also, relationship between domain id and mask is no longer assumed in the macro. v2: Improve grammar in the commit message and rename the iterator to for_each_fw_domain_masked for consistency. (Dave Gordon) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> --- drivers/gpu/drm/i915/i915_debugfs.c | 5 ++--- drivers/gpu/drm/i915/i915_drv.h | 17 +++++++------- drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++++++++-------------------- 3 files changed, 32 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index a2e3af02c292..06fce014d0b4 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1464,12 +1464,11 @@ static int i915_forcewake_domains(struct seq_file *m, void *data) struct drm_device *dev = node->minor->dev; struct drm_i915_private *dev_priv = dev->dev_private; struct intel_uncore_forcewake_domain *fw_domain; - int i; spin_lock_irq(&dev_priv->uncore.lock); - for_each_fw_domain(fw_domain, dev_priv, i) { + for_each_fw_domain(fw_domain, dev_priv) { seq_printf(m, "%s.wake_count = %u\n", - intel_uncore_forcewake_domain_to_str(i), + intel_uncore_forcewake_domain_to_str(fw_domain->id), fw_domain->wake_count); } spin_unlock_irq(&dev_priv->uncore.lock); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7d4c704d7d75..fff025071a0f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -665,6 +665,7 @@ struct intel_uncore { struct intel_uncore_forcewake_domain { struct drm_i915_private *i915; enum forcewake_domain_id id; + enum forcewake_domains mask; unsigned wake_count; struct hrtimer timer; i915_reg_t reg_set; @@ -679,14 +680,14 @@ struct intel_uncore { }; /* Iterate over initialised fw domains */ -#define for_each_fw_domain_mask(domain__, mask__, dev_priv__, i__) \ - for ((i__) = 0, (domain__) = &(dev_priv__)->uncore.fw_domain[0]; \ - (i__) < FW_DOMAIN_ID_COUNT; \ - (i__)++, (domain__) = &(dev_priv__)->uncore.fw_domain[i__]) \ - for_each_if (((mask__) & (dev_priv__)->uncore.fw_domains) & (1 << (i__))) - -#define for_each_fw_domain(domain__, dev_priv__, i__) \ - for_each_fw_domain_mask(domain__, FORCEWAKE_ALL, dev_priv__, i__) +#define for_each_fw_domain_masked(domain__, mask__, dev_priv__) \ + for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \ + (domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \ + (domain__)++) \ + for_each_if ((mask__) & (domain__)->mask) + +#define for_each_fw_domain(domain__, dev_priv__) \ + for_each_fw_domain_masked(domain__, FORCEWAKE_ALL, dev_priv__) #define CSR_VERSION(major, minor) ((major) << 16 | (minor)) #define CSR_VERSION_MAJOR(version) ((version) >> 16) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 76ac990de354..5b9c307c5222 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -111,9 +111,8 @@ static void fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) { + for_each_fw_domain_masked(d, fw_domains, dev_priv) { fw_domain_wait_ack_clear(d); fw_domain_get(d); fw_domain_wait_ack(d); @@ -124,9 +123,8 @@ static void fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) { + for_each_fw_domain_masked(d, fw_domains, dev_priv) { fw_domain_put(d); fw_domain_posting_read(d); } @@ -136,10 +134,9 @@ static void fw_domains_posting_read(struct drm_i915_private *dev_priv) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; /* No need to do for all, just do for first found */ - for_each_fw_domain(d, dev_priv, id) { + for_each_fw_domain(d, dev_priv) { fw_domain_posting_read(d); break; } @@ -149,12 +146,11 @@ static void fw_domains_reset(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *d; - enum forcewake_domain_id id; if (dev_priv->uncore.fw_domains == 0) return; - for_each_fw_domain_mask(d, fw_domains, dev_priv, id) + for_each_fw_domain_masked(d, fw_domains, dev_priv) fw_domain_reset(d); fw_domains_posting_read(dev_priv); @@ -256,7 +252,6 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) unsigned long irqflags; struct intel_uncore_forcewake_domain *domain; int retry_count = 100; - enum forcewake_domain_id id; enum forcewake_domains fw = 0, active_domains; /* Hold uncore.lock across reset to prevent any register access @@ -266,7 +261,7 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) while (1) { active_domains = 0; - for_each_fw_domain(domain, dev_priv, id) { + for_each_fw_domain(domain, dev_priv) { if (hrtimer_cancel(&domain->timer) == 0) continue; @@ -275,9 +270,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); - for_each_fw_domain(domain, dev_priv, id) { + for_each_fw_domain(domain, dev_priv) { if (hrtimer_active(&domain->timer)) - active_domains |= (1 << id); + active_domains |= domain->mask; } if (active_domains == 0) @@ -294,9 +289,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore) WARN_ON(active_domains); - for_each_fw_domain(domain, dev_priv, id) + for_each_fw_domain(domain, dev_priv) if (domain->wake_count) - fw |= 1 << id; + fw |= domain->mask; if (fw) dev_priv->uncore.funcs.force_wake_put(dev_priv, fw); @@ -418,16 +413,15 @@ static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (!dev_priv->uncore.funcs.force_wake_get) return; fw_domains &= dev_priv->uncore.fw_domains; - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { + for_each_fw_domain_masked(domain, fw_domains, dev_priv) { if (domain->wake_count++) - fw_domains &= ~(1 << id); + fw_domains &= ~domain->mask; } if (fw_domains) @@ -485,14 +479,13 @@ static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (!dev_priv->uncore.funcs.force_wake_put) return; fw_domains &= dev_priv->uncore.fw_domains; - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { + for_each_fw_domain_masked(domain, fw_domains, dev_priv) { if (WARN_ON(domain->wake_count == 0)) continue; @@ -546,12 +539,11 @@ void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv, void assert_forcewakes_inactive(struct drm_i915_private *dev_priv) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (!dev_priv->uncore.funcs.force_wake_get) return; - for_each_fw_domain(domain, dev_priv, id) + for_each_fw_domain(domain, dev_priv) WARN_ON(domain->wake_count); } @@ -727,15 +719,14 @@ static inline void __force_wake_auto(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains) { struct intel_uncore_forcewake_domain *domain; - enum forcewake_domain_id id; if (WARN_ON(!fw_domains)) return; /* Ideally GCC would be constant-fold and eliminate this loop */ - for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) { + for_each_fw_domain_masked(domain, fw_domains, dev_priv) { if (domain->wake_count) { - fw_domains &= ~(1 << id); + fw_domains &= ~domain->mask; continue; } @@ -1156,6 +1147,12 @@ static void fw_domain_init(struct drm_i915_private *dev_priv, d->i915 = dev_priv; d->id = domain_id; + BUILD_BUG_ON(FORCEWAKE_RENDER != (1 << FW_DOMAIN_ID_RENDER)); + BUILD_BUG_ON(FORCEWAKE_BLITTER != (1 << FW_DOMAIN_ID_BLITTER)); + BUILD_BUG_ON(FORCEWAKE_MEDIA != (1 << FW_DOMAIN_ID_MEDIA)); + + d->mask = 1 << domain_id; + hrtimer_init(&d->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); d->timer.function = intel_uncore_fw_release_timer; -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains 2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin @ 2016-04-05 11:01 ` Tvrtko Ursulin 2016-04-05 11:22 ` [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson 2 siblings, 0 replies; 23+ messages in thread From: Tvrtko Ursulin @ 2016-04-05 11:01 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> On platforms with multiple forcewake domains it seems more efficient to request all desired ones and then to wait for acks to avoid needlessly serializing on each domain. v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_uncore.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 5b9c307c5222..899cb71914c7 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -115,8 +115,10 @@ fw_domains_get(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma for_each_fw_domain_masked(d, fw_domains, dev_priv) { fw_domain_wait_ack_clear(d); fw_domain_get(d); - fw_domain_wait_ack(d); } + + for_each_fw_domain_masked(d, fw_domains, dev_priv) + fw_domain_wait_ack(d); } static void -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs 2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin @ 2016-04-05 11:22 ` Chris Wilson 2 siblings, 0 replies; 23+ messages in thread From: Chris Wilson @ 2016-04-05 11:22 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Intel-gfx On Tue, Apr 05, 2016 at 12:01:10PM +0100, Tvrtko Ursulin wrote: > for_each_fw_domain(domain, dev_priv, id) { > - if (del_timer_sync(&domain->timer) == 0) > + if (hrtimer_cancel(&domain->timer) == 0) > continue; Had to double check that this was consistent, 0 => not running, => 1 pending and so we need to manually release the timer. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2016-04-05 11:22 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-04 16:51 [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Tvrtko Ursulin 2016-04-04 16:51 ` [PATCH 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin 2016-04-04 19:00 ` Chris Wilson 2016-04-04 19:14 ` Dave Gordon 2016-04-05 9:05 ` Tvrtko Ursulin 2016-04-05 9:36 ` Chris Wilson 2016-04-04 16:51 ` [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin 2016-04-04 19:07 ` Chris Wilson 2016-04-05 9:02 ` Tvrtko Ursulin 2016-04-05 9:47 ` Chris Wilson 2016-04-04 18:58 ` [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson 2016-04-04 19:41 ` Dave Gordon 2016-04-04 20:22 ` Chris Wilson 2016-04-05 8:54 ` Tvrtko Ursulin 2016-04-05 8:59 ` Chris Wilson 2016-04-05 10:02 ` Tvrtko Ursulin 2016-04-05 10:44 ` Chris Wilson 2016-04-04 18:58 ` Dave Gordon 2016-04-05 6:26 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] " Patchwork 2016-04-05 11:01 ` [PATCH v2 1/3] " Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 2/3] drm/i915: Simplify for_each_fw_domain iterators Tvrtko Ursulin 2016-04-05 11:01 ` [PATCH v2 3/3] drm/i915: Do not serialize forcewake acquire across domains Tvrtko Ursulin 2016-04-05 11:22 ` [PATCH v2 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs Chris Wilson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox