From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org,
Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Kuoppala, Mika" <mika.kuoppala@intel.com>
Subject: Re: [PATCH 02/38] drm/i915: Stop the machine whilst capturing the GPU crash dump
Date: Mon, 26 Sep 2016 11:58:34 +0300 [thread overview]
Message-ID: <1474880314.3339.24.camel@linux.intel.com> (raw)
In-Reply-To: <20160920083012.2754-3-chris@chris-wilson.co.uk>
On ti, 2016-09-20 at 09:29 +0100, Chris Wilson wrote:
> The error state is purposefully racy as we expect it to be called at any
> time and so have avoided any locking whilst capturing the crash dump.
> However, with multi-engine GPUs and multiple CPUs, those races can
> manifest into OOPSes as we attempt to chase dangling pointers freed on
> other CPUs. Under discussion are lots of ways to slow down normal
> operation in order to protect the post-mortem error capture, but what it
> we take the opposite approach and freeze the machine whilst the error
> capture runs (note the GPU may still running, but as long as we don't
> process any of the results the driver's bookkeeping will be static).
>
> Note that by of itself, this is not a complete fix. It also depends on
> the compiler barriers in list_add/list_del to prevent traversing the
> lists into the void. We also depend that we only require state from
> carefully controlled sources - i.e. all the state we require for
> post-mortem debugging should be reachable from the request itself so
> that we only have to worry about retrieving the request carefully. Once
> we have the request, we know that all pointers from it are intact.
>
> v2: Avoid drm_clflush_pages() inside stop_machine() as it may use
> stop_machine() itself for its wbinvd fallback.
>
I'm rather sure I already reviewed this code-wise, pending for an A-b
from Daniel.
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-09-26 8:58 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-20 8:29 Multiple timelines, take 2 Chris Wilson
2016-09-20 8:29 ` [PATCH 01/38] drm/i915: Allow disabling error capture Chris Wilson
2016-09-21 6:13 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 02/38] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-09-26 8:58 ` Joonas Lahtinen [this message]
2016-09-20 8:29 ` [PATCH 03/38] drm/i915: Always use the GTT for error capture Chris Wilson
2016-09-21 7:24 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 04/38] drm/i915: Consolidate error object printing Chris Wilson
2016-09-20 8:29 ` [PATCH 05/38] drm/i915: Compress GPU objects in error state Chris Wilson
2016-09-21 7:55 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 06/38] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
2016-09-21 8:05 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 07/38] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
2016-09-20 8:29 ` [PATCH 08/38] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
2016-09-21 8:12 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 09/38] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
2016-09-20 8:29 ` [PATCH 10/38] drm/i915: Defer active reference until required Chris Wilson
2016-09-21 8:44 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 11/38] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2016-09-21 11:50 ` Joonas Lahtinen
2016-09-27 9:10 ` Chris Wilson
2016-09-20 8:29 ` [PATCH 12/38] drm/i915: Reuse the active golden render state batch Chris Wilson
2016-09-26 7:24 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 13/38] drm/i915: Markup GEM API with lockdep asserts Chris Wilson
2016-09-21 11:56 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 14/38] drm/i915: Use a radixtree for random access to the object's backing storage Chris Wilson
2016-09-20 8:29 ` [PATCH 15/38] drm/i915: Refactor object page API Chris Wilson
2016-09-20 8:29 ` [PATCH 16/38] drm/i915: Pass around sg_table to get_pages/put_pages backend Chris Wilson
2016-09-20 11:24 ` kbuild test robot
2016-09-20 8:29 ` [PATCH 17/38] drm/i915: Move object backing storage manipulation to its own locking Chris Wilson
2016-09-20 8:29 ` [PATCH 18/38] drm/i915/dmabuf: Acquire the backing storage outside of struct_mutex Chris Wilson
2016-09-20 8:29 ` [PATCH 19/38] drm/i915: Implement pread without struct-mutex Chris Wilson
2016-09-20 8:29 ` [PATCH 20/38] drm/i915: Implement pwrite " Chris Wilson
2016-09-20 13:47 ` kbuild test robot
2016-09-20 8:29 ` [PATCH 21/38] drm/i915: Acquire the backing storage outside of struct_mutex in set-domain Chris Wilson
2016-09-20 8:29 ` [PATCH 22/38] drm/i915: Move object release to a freelist + worker Chris Wilson
2016-09-20 8:29 ` [PATCH 23/38] drm/i915: Use lockless object free Chris Wilson
2016-09-20 8:29 ` [PATCH 24/38] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
2016-09-26 7:53 ` Joonas Lahtinen
2016-09-20 8:29 ` [PATCH 25/38] drm: Add reference counting to drm_atomic_state Chris Wilson
2016-09-21 7:24 ` Sean Paul
2016-09-20 8:30 ` [PATCH 26/38] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
2016-09-26 8:11 ` Joonas Lahtinen
2016-09-20 8:30 ` [PATCH 27/38] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
2016-09-20 8:30 ` [PATCH 28/38] drm/i915: Queue the idling context switch after all other timelines Chris Wilson
2016-09-26 8:49 ` Joonas Lahtinen
2016-09-20 8:30 ` [PATCH 29/38] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
2016-09-20 8:30 ` [PATCH 30/38] drm/i915: Introduce a global_seqno for each request Chris Wilson
2016-09-20 8:30 ` [PATCH 31/38] drm/i915: Record space required for request emission Chris Wilson
2016-09-20 8:30 ` [PATCH 32/38] drm/i915: Defer " Chris Wilson
2016-09-26 8:53 ` Joonas Lahtinen
2016-09-26 9:04 ` Chris Wilson
2016-09-26 9:06 ` Joonas Lahtinen
2016-09-26 9:25 ` Chris Wilson
2016-09-20 8:30 ` [PATCH 33/38] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
2016-09-20 8:30 ` [PATCH 34/38] drm/i915: Create a unique name for the context Chris Wilson
2016-09-20 8:30 ` [PATCH 35/38] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
2016-09-20 18:49 ` [PATCH] drm/i915: fix semicolon.cocci warnings kbuild test robot
2016-09-20 18:49 ` [PATCH 35/38] drm/i915: Reserve space in the global seqno during request allocation kbuild test robot
2016-09-20 8:30 ` [PATCH 36/38] drm/i915: Enable multiple timelines Chris Wilson
2016-09-26 8:55 ` Joonas Lahtinen
2016-09-20 8:30 ` [PATCH 37/38] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson
2016-09-20 8:30 ` [PATCH 38/38] drm/i915: Support explicit fencing for execbuf Chris Wilson
2016-09-20 9:24 ` ✗ Fi.CI.BAT: failure for series starting with [01/38] drm/i915: Allow disabling error capture Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1474880314.3339.24.camel@linux.intel.com \
--to=joonas.lahtinen@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).