From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed
Date: Mon, 11 Nov 2019 13:25:21 +0200 [thread overview]
Message-ID: <87zhh24pum.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <157347027385.28106.12299078517436926628@skylake-alporthouse-com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2019-11-11 10:54:14)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > If we detect a hang in a closed context, just flush all of its requests
>> > and cancel any remaining execution along the context. Note that after
>> > closing the context, the last reference to the context may be dropped,
>> > leaving it only valid under RCU.
>>
>> Sound good. But is there a window for userspace to start
>> to see -EIO if it resubmits to a closed context?
>
> Userspace can not submit to a closed context (-ENOENT) as that would be
> tantamount to a use-after-free kernel bug.
>
>> In other words, after userspace doing gem_ctx_destroy(ctx_handle),
>> we would return -EINVAL due to ctx_handle being stale
>> earlier than we check for banned status and return -EIO?
>
> It's as simple as if the context is closed, it is removed from the
> file->context_idr and userspace cannot access it. If userspace is racing
> with itself, there's not much we can do other than protect our
> references. If userspace succeeds in submitting to the context prior to
> closing it in another thread, it has the context to continue (and if
> then hangs, it will be shot down immediately). If it loses that race, it
> gets an -ENOENT. If it loses that race so badly the context id is
> replace by a new context, it submits to that new context; which surely
> will end in tears and GPU hangs, but not our fault and nothing we can do
> to prevent that.
Let them shed tears if they bring it on themselves. I was concerned
on a behavioural change on close/resubmit race. But as you explained
racing on a different id, they deserve what they begged for.
We are in a business of protecting the state of all the sane
ones.
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed
Date: Mon, 11 Nov 2019 13:25:21 +0200 [thread overview]
Message-ID: <87zhh24pum.fsf@gaia.fi.intel.com> (raw)
Message-ID: <20191111112521.6W6YFHVSPko4tiEao19OZdVre1y0cxxBY5HytqDDnco@z> (raw)
In-Reply-To: <157347027385.28106.12299078517436926628@skylake-alporthouse-com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2019-11-11 10:54:14)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > If we detect a hang in a closed context, just flush all of its requests
>> > and cancel any remaining execution along the context. Note that after
>> > closing the context, the last reference to the context may be dropped,
>> > leaving it only valid under RCU.
>>
>> Sound good. But is there a window for userspace to start
>> to see -EIO if it resubmits to a closed context?
>
> Userspace can not submit to a closed context (-ENOENT) as that would be
> tantamount to a use-after-free kernel bug.
>
>> In other words, after userspace doing gem_ctx_destroy(ctx_handle),
>> we would return -EINVAL due to ctx_handle being stale
>> earlier than we check for banned status and return -EIO?
>
> It's as simple as if the context is closed, it is removed from the
> file->context_idr and userspace cannot access it. If userspace is racing
> with itself, there's not much we can do other than protect our
> references. If userspace succeeds in submitting to the context prior to
> closing it in another thread, it has the context to continue (and if
> then hangs, it will be shot down immediately). If it loses that race, it
> gets an -ENOENT. If it loses that race so badly the context id is
> replace by a new context, it submits to that new context; which surely
> will end in tears and GPU hangs, but not our fault and nothing we can do
> to prevent that.
Let them shed tears if they bring it on themselves. I was concerned
on a behavioural change on close/resubmit race. But as you explained
racing on a different id, they deserve what they begged for.
We are in a business of protecting the state of all the sane
ones.
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-11-11 11:25 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-10 18:57 [PATCH 01/25] drm/i915: Protect context while grabbing its name for the request Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 02/25] drm/i915/gem: Embed context/timeline name inside the GEM context Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 11:20 ` Mika Kuoppala
2019-11-11 11:20 ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 03/25] drm/i915/gem: Update context name on closing Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 10:47 ` Mika Kuoppala
2019-11-11 10:47 ` [Intel-gfx] " Mika Kuoppala
2019-11-11 10:58 ` Chris Wilson
2019-11-11 10:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 04/25] drm/i915/execlists: Move reset_active() from schedule-out to schedule-in Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 05/25] drm/i915/pmu: "Frequency" is reported as accumulated cycles Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 06/25] drm/i915/userptr: Handle unlocked gup retries Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 10:54 ` Mika Kuoppala
2019-11-11 10:54 ` [Intel-gfx] " Mika Kuoppala
2019-11-11 11:04 ` Chris Wilson
2019-11-11 11:04 ` [Intel-gfx] " Chris Wilson
2019-11-11 11:25 ` Mika Kuoppala [this message]
2019-11-11 11:25 ` Mika Kuoppala
2019-11-10 18:57 ` [PATCH 08/25] drm/i915: Show guilty context name on GPU reset Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 11:26 ` Mika Kuoppala
2019-11-11 11:26 ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 09/25] drm/i915/icl: Refine PG_HYSTERESIS Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 10:59 ` Mika Kuoppala
2019-11-11 10:59 ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 10/25] drm/i915/execlists: Reduce barrier on context switch to a wmb() Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 13:19 ` Joonas Lahtinen
2019-11-11 13:19 ` [Intel-gfx] " Joonas Lahtinen
2019-11-10 18:57 ` [PATCH 11/25] drm/i915/gem: Silence sparse for RCU protection inside the constructor Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 12/25] drm/i915/userptr: Try to acquire the page lock around set_page_dirty() Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 13/25] drm/i915: Taint the kernel on dumping the GEM ftrace buffer Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 12:44 ` Joonas Lahtinen
2019-11-11 12:44 ` [Intel-gfx] " Joonas Lahtinen
2019-11-10 18:57 ` [PATCH 14/25] drm/i915/selftests: Exercise parallel blit operations on a single ctx Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 12:10 ` Matthew Auld
2019-11-11 12:10 ` [Intel-gfx] " Matthew Auld
2019-11-10 18:57 ` [PATCH 15/25] drm/i915/selftests: Perform some basic cycle counting of MI ops Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 17:10 ` Matthew Auld
2019-11-11 17:10 ` [Intel-gfx] " Matthew Auld
2019-11-11 17:16 ` Chris Wilson
2019-11-11 17:16 ` [Intel-gfx] " Chris Wilson
2019-11-11 17:20 ` Matthew Auld
2019-11-11 17:20 ` [Intel-gfx] " Matthew Auld
2019-11-10 18:57 ` [PATCH 16/25] drm/i915/selftests: Mock the engine sorting for easy validation Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 17/25] drm/i915/selftests: Fill all the drm_vma_manager holes Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-11 12:01 ` Matthew Auld
2019-11-11 12:01 ` [Intel-gfx] " Matthew Auld
2019-11-11 12:09 ` Chris Wilson
2019-11-11 12:09 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 18/25] Revert "drm/i915: use a separate context for gpu relocs" Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 19/25] drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 20/25] drm/i915: Drop GEM context as a direct link from i915_request Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 21/25] drm/i915: Push the use-semaphore marker onto the intel_context Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 22/25] drm/i915: Remove i915->kernel_context Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 23/25] drm/i915: Move i915_gem_init_contexts() earlier Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 24/25] drm/i915/uc: Use an internal buffer for firmware images Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 25/25] drm/i915/gt: Pull GT initialisation under intel_gt_init() Chris Wilson
2019-11-10 18:58 ` [Intel-gfx] " Chris Wilson
2019-11-10 19:20 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/25] drm/i915: Protect context while grabbing its name for the request Patchwork
2019-11-10 19:20 ` [Intel-gfx] " Patchwork
2019-11-10 19:40 ` ✓ Fi.CI.BAT: success " Patchwork
2019-11-10 19:40 ` [Intel-gfx] " Patchwork
2019-11-11 15:34 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-11-11 15:34 ` [Intel-gfx] " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zhh24pum.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.