public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling
@ 2019-04-30  9:44 Chris Wilson
  2019-04-30  9:44 ` [PATCH 2/2] drm/i915: Cancel retire_worker on parking Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Chris Wilson @ 2019-04-30  9:44 UTC (permalink / raw)
  To: intel-gfx

When the system is idling, contention for struct_mutex should be low and
so we will be more efficient to wait for a contended mutex than
reschedule.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_pm.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_pm.c b/drivers/gpu/drm/i915/i915_gem_pm.c
index 3554d55dae35..3b6e8d5be8e1 100644
--- a/drivers/gpu/drm/i915/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/i915_gem_pm.c
@@ -47,13 +47,7 @@ static void idle_work_handler(struct work_struct *work)
 	struct drm_i915_private *i915 =
 		container_of(work, typeof(*i915), gem.idle_work.work);
 
-	if (!mutex_trylock(&i915->drm.struct_mutex)) {
-		/* Currently busy, come back later */
-		mod_delayed_work(i915->wq,
-				 &i915->gem.idle_work,
-				 msecs_to_jiffies(50));
-		return;
-	}
+	mutex_lock(&i915->drm.struct_mutex);
 
 	intel_wakeref_lock(&i915->gt.wakeref);
 	if (!intel_wakeref_active(&i915->gt.wakeref))
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/i915: Cancel retire_worker on parking
  2019-04-30  9:44 [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Chris Wilson
@ 2019-04-30  9:44 ` Chris Wilson
  2019-04-30 13:22   ` Tvrtko Ursulin
  2019-04-30 12:33 ` [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Tvrtko Ursulin
  2019-04-30 13:49 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork
  2 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2019-04-30  9:44 UTC (permalink / raw)
  To: intel-gfx

Replace the racy continuation check within retire_work with a definite
kill-switch on idling. The race was being exposed by gem_concurrent_blit
where the retire_worker would be terminated too early leaving us
spinning in debugfs/i915_drop_caches with nothing flushing the
retirement queue.

Although that the igt is trying to idle from one child while submitting
from another may be a contributing factor as to why  it runs so slowly...

Testcase: igt/gem_concurrent_blit
Fixes: 79ffac8599c4 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_pm.c            | 27 ++++++++++---------
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_pm.c b/drivers/gpu/drm/i915/i915_gem_pm.c
index 3b6e8d5be8e1..88be810758ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/i915_gem_pm.c
@@ -46,15 +46,23 @@ static void idle_work_handler(struct work_struct *work)
 {
 	struct drm_i915_private *i915 =
 		container_of(work, typeof(*i915), gem.idle_work.work);
+	bool restart = true;
 
+	cancel_delayed_work_sync(&i915->gem.retire_work);
 	mutex_lock(&i915->drm.struct_mutex);
 
 	intel_wakeref_lock(&i915->gt.wakeref);
-	if (!intel_wakeref_active(&i915->gt.wakeref))
+	if (!intel_wakeref_active(&i915->gt.wakeref)) {
 		i915_gem_park(i915);
+		restart = false;
+	}
 	intel_wakeref_unlock(&i915->gt.wakeref);
 
 	mutex_unlock(&i915->drm.struct_mutex);
+	if (restart)
+		queue_delayed_work(i915->wq,
+				   &i915->gem.retire_work,
+				   round_jiffies_up_relative(HZ));
 }
 
 static void retire_work_handler(struct work_struct *work)
@@ -68,10 +76,9 @@ static void retire_work_handler(struct work_struct *work)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (intel_wakeref_active(&i915->gt.wakeref))
-		queue_delayed_work(i915->wq,
-				   &i915->gem.retire_work,
-				   round_jiffies_up_relative(HZ));
+	queue_delayed_work(i915->wq,
+			   &i915->gem.retire_work,
+			   round_jiffies_up_relative(HZ));
 }
 
 static int pm_notifier(struct notifier_block *nb,
@@ -159,15 +166,9 @@ void i915_gem_suspend(struct drm_i915_private *i915)
 	 * reset the GPU back to its idle, low power state.
 	 */
 	GEM_BUG_ON(i915->gt.awake);
-	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
-
-	drain_delayed_work(&i915->gem.retire_work);
+	flush_delayed_work(&i915->gem.idle_work);
 
-	/*
-	 * As the idle_work is rearming if it detects a race, play safe and
-	 * repeat the flush until it is definitely idle.
-	 */
-	drain_delayed_work(&i915->gem.idle_work);
+	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
 
 	i915_gem_drain_freed_objects(i915);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index e4033d0576c4..ce54f8dc13cc 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -58,8 +58,7 @@ static void mock_device_release(struct drm_device *dev)
 	i915_gem_contexts_lost(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	drain_delayed_work(&i915->gem.retire_work);
-	drain_delayed_work(&i915->gem.idle_work);
+	flush_delayed_work(&i915->gem.idle_work);
 	i915_gem_drain_workqueue(i915);
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling
  2019-04-30  9:44 [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Chris Wilson
  2019-04-30  9:44 ` [PATCH 2/2] drm/i915: Cancel retire_worker on parking Chris Wilson
@ 2019-04-30 12:33 ` Tvrtko Ursulin
  2019-04-30 13:49 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork
  2 siblings, 0 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2019-04-30 12:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 30/04/2019 10:44, Chris Wilson wrote:
> When the system is idling, contention for struct_mutex should be low and
> so we will be more efficient to wait for a contended mutex than
> reschedule.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_pm.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_pm.c b/drivers/gpu/drm/i915/i915_gem_pm.c
> index 3554d55dae35..3b6e8d5be8e1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_pm.c
> +++ b/drivers/gpu/drm/i915/i915_gem_pm.c
> @@ -47,13 +47,7 @@ static void idle_work_handler(struct work_struct *work)
>   	struct drm_i915_private *i915 =
>   		container_of(work, typeof(*i915), gem.idle_work.work);
>   
> -	if (!mutex_trylock(&i915->drm.struct_mutex)) {
> -		/* Currently busy, come back later */
> -		mod_delayed_work(i915->wq,
> -				 &i915->gem.idle_work,
> -				 msecs_to_jiffies(50));
> -		return;
> -	}
> +	mutex_lock(&i915->drm.struct_mutex);
>   
>   	intel_wakeref_lock(&i915->gt.wakeref);
>   	if (!intel_wakeref_active(&i915->gt.wakeref))
> 

I don't see any real downsides to this indeed.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Possible tweak could be to leave this as is, maybe just not go for the 
reduced idle timer on re-schedule, but add a cancel_delayed_work on the 
unparking side of things. That way any mutex activity without actual 
device unparking would only slightly delay going idle, while idle_work 
would retain it's minimal disturbance of the mutex.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/i915: Cancel retire_worker on parking
  2019-04-30  9:44 ` [PATCH 2/2] drm/i915: Cancel retire_worker on parking Chris Wilson
@ 2019-04-30 13:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2019-04-30 13:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 30/04/2019 10:44, Chris Wilson wrote:
> Replace the racy continuation check within retire_work with a definite
> kill-switch on idling. The race was being exposed by gem_concurrent_blit
> where the retire_worker would be terminated too early leaving us
> spinning in debugfs/i915_drop_caches with nothing flushing the
> retirement queue.
> 
> Although that the igt is trying to idle from one child while submitting
> from another may be a contributing factor as to why  it runs so slowly...
> 
> Testcase: igt/gem_concurrent_blit
> Fixes: 79ffac8599c4 ("drm/i915: Invert the GEM wakeref hierarchy")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_pm.c            | 27 ++++++++++---------
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +--
>   2 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_pm.c b/drivers/gpu/drm/i915/i915_gem_pm.c
> index 3b6e8d5be8e1..88be810758ae 100644
> --- a/drivers/gpu/drm/i915/i915_gem_pm.c
> +++ b/drivers/gpu/drm/i915/i915_gem_pm.c
> @@ -46,15 +46,23 @@ static void idle_work_handler(struct work_struct *work)
>   {
>   	struct drm_i915_private *i915 =
>   		container_of(work, typeof(*i915), gem.idle_work.work);
> +	bool restart = true;
>   
> +	cancel_delayed_work_sync(&i915->gem.retire_work);
>   	mutex_lock(&i915->drm.struct_mutex);

Wouldn't it be better to cancel_delayed_work and then 
i915_retire_requests under the lock?

With cancel_delayed_work_sync outside struct_mutex it sounds it could 
miss a retire pass.

>   
>   	intel_wakeref_lock(&i915->gt.wakeref);
> -	if (!intel_wakeref_active(&i915->gt.wakeref))
> +	if (!intel_wakeref_active(&i915->gt.wakeref)) {
>   		i915_gem_park(i915);
> +		restart = false;
> +	}
>   	intel_wakeref_unlock(&i915->gt.wakeref);
>   
>   	mutex_unlock(&i915->drm.struct_mutex);
> +	if (restart)
> +		queue_delayed_work(i915->wq,
> +				   &i915->gem.retire_work,
> +				   round_jiffies_up_relative(HZ));
>   }
>   
>   static void retire_work_handler(struct work_struct *work)
> @@ -68,10 +76,9 @@ static void retire_work_handler(struct work_struct *work)
>   		mutex_unlock(&i915->drm.struct_mutex);
>   	}
>   
> -	if (intel_wakeref_active(&i915->gt.wakeref))
> -		queue_delayed_work(i915->wq,
> -				   &i915->gem.retire_work,
> -				   round_jiffies_up_relative(HZ));
> +	queue_delayed_work(i915->wq,
> +			   &i915->gem.retire_work,
> +			   round_jiffies_up_relative(HZ));

So retire runs until idle stops it - that sounds okay.

>   }
>   
>   static int pm_notifier(struct notifier_block *nb,
> @@ -159,15 +166,9 @@ void i915_gem_suspend(struct drm_i915_private *i915)
>   	 * reset the GPU back to its idle, low power state.
>   	 */
>   	GEM_BUG_ON(i915->gt.awake);
> -	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
> -
> -	drain_delayed_work(&i915->gem.retire_work);
> +	flush_delayed_work(&i915->gem.idle_work);
>   
> -	/*
> -	 * As the idle_work is rearming if it detects a race, play safe and
> -	 * repeat the flush until it is definitely idle.
> -	 */
> -	drain_delayed_work(&i915->gem.idle_work);
> +	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
>   
>   	i915_gem_drain_freed_objects(i915);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index e4033d0576c4..ce54f8dc13cc 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -58,8 +58,7 @@ static void mock_device_release(struct drm_device *dev)
>   	i915_gem_contexts_lost(i915);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   
> -	drain_delayed_work(&i915->gem.retire_work);
> -	drain_delayed_work(&i915->gem.idle_work);
> +	flush_delayed_work(&i915->gem.idle_work);
>   	i915_gem_drain_workqueue(i915);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
> 

I am now thinking debugfs does not have to do things indirectly via 
flush and drain. How about it calls what it needs directly? Unless I am 
missing something that could be done separate to this patch and would 
also fix the drop_caches spinning problem.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Wait for the struct_mutex on idling
  2019-04-30  9:44 [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Chris Wilson
  2019-04-30  9:44 ` [PATCH 2/2] drm/i915: Cancel retire_worker on parking Chris Wilson
  2019-04-30 12:33 ` [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Tvrtko Ursulin
@ 2019-04-30 13:49 ` Patchwork
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2019-04-30 13:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915: Wait for the struct_mutex on idling
URL   : https://patchwork.freedesktop.org/series/60098/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_6017 -> Patchwork_12907
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12907 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12907, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/60098/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12907:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live_execlists:
    - fi-kbl-r:           [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6017/fi-kbl-r/igt@i915_selftest@live_execlists.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12907/fi-kbl-r/igt@i915_selftest@live_execlists.html

  
Known issues
------------

  Here are the changes found in Patchwork_12907 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_workarounds:
    - fi-snb-2600:        [PASS][3] -> [INCOMPLETE][4] ([fdo#105411])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6017/fi-snb-2600/igt@i915_selftest@live_workarounds.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12907/fi-snb-2600/igt@i915_selftest@live_workarounds.html

  * igt@kms_chamelium@dp-crc-fast:
    - fi-kbl-7500u:       [PASS][5] -> [DMESG-WARN][6] ([fdo#103841])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6017/fi-kbl-7500u/igt@kms_chamelium@dp-crc-fast.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12907/fi-kbl-7500u/igt@kms_chamelium@dp-crc-fast.html

  
#### Possible fixes ####

  * igt@i915_selftest@live_contexts:
    - fi-bdw-gvtdvm:      [DMESG-FAIL][7] ([fdo#110235]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6017/fi-bdw-gvtdvm/igt@i915_selftest@live_contexts.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12907/fi-bdw-gvtdvm/igt@i915_selftest@live_contexts.html
    - fi-skl-gvtdvm:      [DMESG-FAIL][9] ([fdo#110235]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6017/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12907/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html

  
  [fdo#103841]: https://bugs.freedesktop.org/show_bug.cgi?id=103841
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#110235]: https://bugs.freedesktop.org/show_bug.cgi?id=110235


Participating hosts (53 -> 44)
------------------------------

  Missing    (9): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-skl-6770hq fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-icl-y fi-byt-clapper 


Build changes
-------------

  * Linux: CI_DRM_6017 -> Patchwork_12907

  CI_DRM_6017: 69c3a37af9430650d1fc2a5555d4d0786898694d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4971: fc5e0467eb6913d21ad932aa8a31c77fdb5a9c77 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12907: c144d3af59602fcb4ddb218a788c961e47317432 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c144d3af5960 drm/i915: Cancel retire_worker on parking
f2b3d409c989 drm/i915: Wait for the struct_mutex on idling

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12907/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-30 13:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-30  9:44 [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Chris Wilson
2019-04-30  9:44 ` [PATCH 2/2] drm/i915: Cancel retire_worker on parking Chris Wilson
2019-04-30 13:22   ` Tvrtko Ursulin
2019-04-30 12:33 ` [PATCH 1/2] drm/i915: Wait for the struct_mutex on idling Tvrtko Ursulin
2019-04-30 13:49 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox