From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Mika Kuoppala <mika.kuoppala@intel.com>, Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH 06/20] drm/i915: Reinstate hang recovery work queue.
Date: Wed, 13 Jan 2016 17:28:18 +0000 [thread overview]
Message-ID: <1452706112-8617-7-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1452706112-8617-1-git-send-email-arun.siluvery@linux.intel.com>
From: Tomas Elf <tomas.elf@intel.com>
There used to be a work queue separating the error handler from the hang
recovery path, which was removed a while back in this commit:
commit b8d24a06568368076ebd5a858a011699a97bfa42
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Jan 28 17:03:14 2015 +0200
drm/i915: Remove nested work in gpu error handling
Now we need to revert most of that commit since the work queue separating hang
detection from hang recovery is needed in preparation for the upcoming watchdog
timeout feature. The watchdog interrupt service routine will be a second
callsite of the error handler alongside the periodic hang checker, which runs
in a work queue context. Seeing as the error handler will be serving a caller
in a hard interrupt execution context that means that the error handler must
never end up in a situation where it needs to grab the struct_mutex.
Unfortunately, that is exactly what we need to do first at the start of the
hang recovery path, which might potentially sleep if the struct_mutex is
already held by another thread. Not good when you're in a hard interrupt
context.
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_dma.c | 1 +
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_irq.c | 31 ++++++++++++++++++++++++-------
3 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c45ec353..67003c2 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1203,6 +1203,7 @@ int i915_driver_unload(struct drm_device *dev)
/* Free error state after interrupts are fully disabled. */
cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
i915_destroy_error_state(dev);
+ cancel_work_sync(&dev_priv->gpu_error.work);
if (dev->pdev->msi_enabled)
pci_disable_msi(dev->pdev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5be7d3e..072ca37 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1337,6 +1337,7 @@ struct i915_gpu_error {
spinlock_t lock;
/* Protected by the above dev->gpu_error.lock. */
struct drm_i915_error_state *first_error;
+ struct work_struct work;
unsigned long missed_irq_rings;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index fef74cf..8937c82 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2457,16 +2457,19 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
}
/**
- * i915_reset_and_wakeup - do process context error handling work
- * @dev: drm device
+ * i915_error_work_func - do process context error handling work
+ * @work: work item containing error struct, passed by the error handler
*
* Fire an error uevent so userspace can see that a hang or error
* was detected.
*/
-static void i915_reset_and_wakeup(struct drm_device *dev)
+static void i915_error_work_func(struct work_struct *work)
{
- struct drm_i915_private *dev_priv = to_i915(dev);
- struct i915_gpu_error *error = &dev_priv->gpu_error;
+ struct i915_gpu_error *error = container_of(work, struct i915_gpu_error,
+ work);
+ struct drm_i915_private *dev_priv =
+ container_of(error, struct drm_i915_private, gpu_error);
+ struct drm_device *dev = dev_priv->dev;
char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
@@ -2827,7 +2830,21 @@ void i915_handle_error(struct drm_device *dev, u32 engine_mask, bool wedged,
i915_error_wake_up(dev_priv, false);
}
- i915_reset_and_wakeup(dev);
+ /*
+ * Gen 7:
+ *
+ * Our reset work can grab modeset locks (since it needs to reset the
+ * state of outstanding pageflips). Hence it must not be run on our own
+ * dev-priv->wq work queue for otherwise the flush_work in the pageflip
+ * code will deadlock.
+ * If error_work is already in the work queue then it will not be added
+ * again. It hasn't yet executed so it will see the reset flags when
+ * it is scheduled. If it isn't in the queue or it is currently
+ * executing then this call will add it to the queue again so that
+ * even if it misses the reset flags during the current call it is
+ * guaranteed to see them on the next call.
+ */
+ schedule_work(&dev_priv->gpu_error.work);
}
/* Called from drm generic code, passed 'crtc' which
@@ -4682,7 +4699,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
struct drm_device *dev = dev_priv->dev;
intel_hpd_init_work(dev_priv);
-
+ INIT_WORK(&dev_priv->gpu_error.work, i915_error_work_func);
INIT_WORK(&dev_priv->rps.work, gen6_pm_rps_work);
INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-01-13 17:28 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-13 17:28 [PATCH 00/20] TDR/watchdog support for gen8 Arun Siluvery
2016-01-13 17:28 ` [PATCH 01/20] drm/i915: Make i915_gem_reset_ring_status() public Arun Siluvery
2016-01-13 17:28 ` [PATCH 02/20] drm/i915: Generalise common GPU engine reset request/unrequest code Arun Siluvery
2016-01-22 11:24 ` Mika Kuoppala
2016-01-13 17:28 ` [PATCH 03/20] drm/i915: TDR / per-engine hang recovery support for gen8 Arun Siluvery
2016-01-13 21:16 ` Chris Wilson
2016-01-13 21:21 ` Chris Wilson
2016-01-29 14:16 ` Mika Kuoppala
2016-01-13 17:28 ` [PATCH 04/20] drm/i915: TDR / per-engine hang detection Arun Siluvery
2016-01-13 20:37 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 05/20] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-01-13 20:49 ` Chris Wilson
2016-01-13 17:28 ` Arun Siluvery [this message]
2016-01-13 21:01 ` [PATCH 06/20] drm/i915: Reinstate hang recovery work queue Chris Wilson
2016-01-13 17:28 ` [PATCH 07/20] drm/i915: Watchdog timeout: Hang detection integration into error handler Arun Siluvery
2016-01-13 21:13 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 08/20] drm/i915: Watchdog timeout: IRQ handler for gen8 Arun Siluvery
2016-01-13 17:28 ` [PATCH 09/20] drm/i915: Watchdog timeout: Ringbuffer command emission " Arun Siluvery
2016-01-13 17:28 ` [PATCH 10/20] drm/i915: Watchdog timeout: DRM kernel interface enablement Arun Siluvery
2016-01-13 17:28 ` [PATCH 11/20] drm/i915: Fake lost context event interrupts through forced CSB checking Arun Siluvery
2016-01-13 17:28 ` [PATCH 12/20] drm/i915: Debugfs interface for per-engine hang recovery Arun Siluvery
2016-01-13 17:28 ` [PATCH 13/20] drm/i915: Test infrastructure for context state inconsistency simulation Arun Siluvery
2016-01-13 17:28 ` [PATCH 14/20] drm/i915: TDR/watchdog trace points Arun Siluvery
2016-01-13 17:28 ` [PATCH 15/20] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-01-13 17:28 ` [PATCH 16/20] drm/i915: Fix __i915_wait_request() behaviour during hang detection Arun Siluvery
2016-01-13 17:28 ` [PATCH 17/20] drm/i915: Extended error state with TDR count, watchdog count and engine reset count Arun Siluvery
2016-01-13 17:28 ` [PATCH 18/20] drm/i915: TDR / per-engine hang recovery kernel docs Arun Siluvery
2016-01-13 17:28 ` [PATCH 19/20] drm/i915: drm/i915 changes to simulated hangs Arun Siluvery
2016-01-13 17:28 ` [PATCH 20/20] drm/i915: Enable TDR / per-engine hang recovery Arun Siluvery
2016-01-14 8:30 ` ✗ failure: Fi.CI.BAT Patchwork
-- strict thread matches above, loose matches on Subject: below --
2015-10-23 1:32 [PATCH 00/20] TDR/watchdog support for gen8 Tomas Elf
2015-10-23 1:32 ` [PATCH 06/20] drm/i915: Reinstate hang recovery work queue Tomas Elf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1452706112-8617-7-git-send-email-arun.siluvery@linux.intel.com \
--to=arun.siluvery@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@intel.com \
--cc=tomas.elf@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).