From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, stable@vger.kernel.org
Subject: Re: [PATCH 4/4] drm/i915: Late request cancellations are harmful
Date: Mon, 11 Apr 2016 14:50:17 +0100 [thread overview]
Message-ID: <570BAB99.5050402@linux.intel.com> (raw)
In-Reply-To: <1460194059-12447-5-git-send-email-chris@chris-wilson.co.uk>
On 09/04/16 10:27, Chris Wilson wrote:
> Conceptually, each request is a record of a hardware transaction - we
> build up a list of pending commands and then either commit them to
> hardware, or cancel them. However, whilst building up the list of
> pending commands, we may modify state outside of the request and make
> references to the pending request. If we do so and then cancel that
> request, external objects then point to the deleted request leading to
> both graphical and memory corruption.
>
> The easiest example is to consider object/VMA tracking. When we mark an
> object as active in a request, we store a pointer to this, the most
> recent request, in the object. Then we want to free that object, we wait
> for the most recent request to be idle before proceeding (otherwise the
> hardware will write to pages now owned by the system, or we will attempt
> to read from those pages before the hardware is finished writing). If
> the request was cancelled instead, that wait completes immediately. As a
> result, all requests must be committed and not cancelled if the external
> state is unknown.
This was a bit hard to figure out.
So we cannot unwind because once we set last_read_req we lose the data
on what was the previous one, before this transaction started?
Intuitively I don't like the idea of sending unfinished stuff to the
GPU, when it failed at some random point in ring buffer preparation.
So I am struggling with reviewing this as I have in the previous round.
> All that remains of i915_gem_request_cancel() users are just a couple of
> extremely unlikely allocation failures, so remove the API entirely.
This parts feels extra weird because in the non-execbuf cases we
actually can cancel the transaction without any issues, correct?
Would middle-ground be to keep the cancellations for in-kernel submits,
and for execbuf rewind the ringbuf so only request post-amble is sent to
the GPU?
> A consequence of committing all incomplete requests is that we generate
> excess breadcrumbs and fill the ring much more often with dummy work. We
> have completely undone the outstanding_last_seqno optimisation.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 --
> drivers/gpu/drm/i915/i915_gem.c | 50 ++++++++++++------------------
> drivers/gpu/drm/i915/i915_gem_context.c | 21 ++++++-------
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 +++------
> drivers/gpu/drm/i915/intel_display.c | 2 +-
> drivers/gpu/drm/i915/intel_lrc.c | 4 +--
> drivers/gpu/drm/i915/intel_overlay.c | 8 ++---
> 7 files changed, 39 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a93e5dd4fa9a..f374db8de673 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2320,7 +2320,6 @@ struct drm_i915_gem_request {
> struct drm_i915_gem_request * __must_check
> i915_gem_request_alloc(struct intel_engine_cs *engine,
> struct intel_context *ctx);
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> void i915_gem_request_free(struct kref *req_ref);
> int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> struct drm_file *file);
> @@ -2872,7 +2871,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv);
> void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> struct drm_i915_gem_request *req);
> -void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
> int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
> struct drm_i915_gem_execbuffer2 *args,
> struct list_head *vmas);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1c3ff56594d6..42227495803f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2753,7 +2753,8 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
> * fully prepared. Thus it can be cleaned up using the proper
> * free code.
> */
> - i915_gem_request_cancel(req);
> + intel_ring_reserved_space_cancel(req->ringbuf);
> + i915_gem_request_unreference(req);
> return ret;
> }
>
> @@ -2790,13 +2791,6 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> return err ? ERR_PTR(err) : req;
> }
>
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> -{
> - intel_ring_reserved_space_cancel(req->ringbuf);
> -
> - i915_gem_request_unreference(req);
> -}
> -
> struct drm_i915_gem_request *
> i915_gem_find_active_request(struct intel_engine_cs *engine)
> {
> @@ -3410,12 +3404,9 @@ int i915_gpu_idle(struct drm_device *dev)
> return PTR_ERR(req);
>
> ret = i915_switch_context(req);
> - if (ret) {
> - i915_gem_request_cancel(req);
> - return ret;
> - }
> -
> i915_add_request_no_flush(req);
> + if (ret)
> + return ret;
Looks like with this it could execute the context switch on the GPU but
not update the engine->last_context in do_switch().
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-04-11 13:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-09 9:27 GPU and machine hang fixes, take 2 Chris Wilson
2016-04-09 9:27 ` [PATCH 1/4] drm/i915: Prevent machine death on Ivybridge context switching Chris Wilson
2016-04-09 9:27 ` [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0 Chris Wilson
2016-04-09 9:39 ` Chris Wilson
2016-04-09 9:27 ` [PATCH 3/4] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
2016-04-09 9:27 ` [PATCH 4/4] drm/i915: Late request cancellations are harmful Chris Wilson
2016-04-11 13:50 ` Tvrtko Ursulin [this message]
2016-04-11 14:19 ` Chris Wilson
2016-04-12 6:23 ` [PATCH] drm/i915: Reorganise legacy context switch to cope with late failure Chris Wilson
2016-04-09 9:56 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching Patchwork
2016-04-09 10:39 ` Chris Wilson
2016-04-12 7:54 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570BAB99.5050402@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox