From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
stable <stable@vger.kernel.org>,
Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore
Date: Fri, 30 Jun 2017 21:46:36 +0300 [thread overview]
Message-ID: <20170630184636.GA12629@intel.com> (raw)
In-Reply-To: <CAKMK7uETmJ2Ja9NANpJOhz4weSLh5-JjWES7Xi3PJiMVAi0_Fg@mail.gmail.com>
On Fri, Jun 30, 2017 at 08:23:58PM +0200, Daniel Vetter wrote:
> On Fri, Jun 30, 2017 at 5:44 PM, Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >> And if the GEM folks insist the old behavior can't be restored, then we
> >> just need a tailor-made get-out-of-jail card for gen4 gpu reset somewhere
> >> in i915_sw_fence. Force-completing all render requests atomic updates
> >> depend upon is imo the simplest solution to this, and we've had a driver
> >> that worked like that for years.
> >
> > And it used to break all the time. I think we've had to fix it at least
> > three times by now. So I tend to think it's better to fix it in a way
> > that won't break so easily.
>
> Why exactly is making the atomic code massive more tricky the easy
> fix?
I don't see what this massive trickyness is. Compared to the rest of
atomic what I have is absolutely trivial. Just the
duplicate_committed_state() and the '.committed_state = foo'
assignments in hw_done(). That's it really.
> That's the part I don't get. Yes it got broken a bunch because no
> one runs CI and everyone forgets that gen3/4 reset the display in gpu
> reset, but in the end we do have a depency loop, and either the
> modeset side or the render side needs to bail out and cancel it's
> async stuff (whether that's a request or a nonblocking flip/atomic
> commit doesn't matter). In my opinion, cancelling the request (even if
> we're clever and only cancel the requests for the modeset waiters,
> which isn't to hard to pull off) seems about the simplest option.
> Especially since we need that code anyway, even TDR can't safe
> everything and resubmit under all circumstances (at least the buggy
> batch can't be resubmitted).
>
> Cancelling any kind of atomic commit otoh looks like a lot more
> complexity.
I'm not cancelling anything.
> Why do you think this is the easier, or at least less
> fragile option? This patch series is full of FIXMEs, and even the more
> complete set seems to have a pile of holes. Plus we need to stop using
> obj->state, and I don't see any easy way to test for that (since the
> gen3/4 gpu reset case is the only corner cases that currently needs
> that).
We need to fix that stuff anyway if we ever want to queue up multiple
commits for the same crtc. The stuff I have that is specific to this
reset stuff is actually very simple. The rest is just fixing up the
huge mess we've already made.
--
Ville Syrjälä
Intel OTC
WARNING: multiple messages have this Message-ID (diff)
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
stable <stable@vger.kernel.org>,
Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore
Date: Fri, 30 Jun 2017 21:46:36 +0300 [thread overview]
Message-ID: <20170630184636.GA12629@intel.com> (raw)
In-Reply-To: <CAKMK7uETmJ2Ja9NANpJOhz4weSLh5-JjWES7Xi3PJiMVAi0_Fg@mail.gmail.com>
On Fri, Jun 30, 2017 at 08:23:58PM +0200, Daniel Vetter wrote:
> On Fri, Jun 30, 2017 at 5:44 PM, Ville Syrj�l�
> <ville.syrjala@linux.intel.com> wrote:
> >> And if the GEM folks insist the old behavior can't be restored, then we
> >> just need a tailor-made get-out-of-jail card for gen4 gpu reset somewhere
> >> in i915_sw_fence. Force-completing all render requests atomic updates
> >> depend upon is imo the simplest solution to this, and we've had a driver
> >> that worked like that for years.
> >
> > And it used to break all the time. I think we've had to fix it at least
> > three times by now. So I tend to think it's better to fix it in a way
> > that won't break so easily.
>
> Why exactly is making the atomic code massive more tricky the easy
> fix?
I don't see what this massive trickyness is. Compared to the rest of
atomic what I have is absolutely trivial. Just the
duplicate_committed_state() and the '.committed_state = foo'
assignments in hw_done(). That's it really.
> That's the part I don't get. Yes it got broken a bunch because no
> one runs CI and everyone forgets that gen3/4 reset the display in gpu
> reset, but in the end we do have a depency loop, and either the
> modeset side or the render side needs to bail out and cancel it's
> async stuff (whether that's a request or a nonblocking flip/atomic
> commit doesn't matter). In my opinion, cancelling the request (even if
> we're clever and only cancel the requests for the modeset waiters,
> which isn't to hard to pull off) seems about the simplest option.
> Especially since we need that code anyway, even TDR can't safe
> everything and resubmit under all circumstances (at least the buggy
> batch can't be resubmitted).
>
> Cancelling any kind of atomic commit otoh looks like a lot more
> complexity.
I'm not cancelling anything.
> Why do you think this is the easier, or at least less
> fragile option? This patch series is full of FIXMEs, and even the more
> complete set seems to have a pile of holes. Plus we need to stop using
> obj->state, and I don't see any easy way to test for that (since the
> gen3/4 gpu reset case is the only corner cases that currently needs
> that).
We need to fix that stuff anyway if we ever want to queue up multiple
commits for the same crtc. The stuff I have that is specific to this
reset stuff is actually very simple. The rest is just fixing up the
huge mess we've already made.
--
Ville Syrj�l�
Intel OTC
next prev parent reply other threads:[~2017-06-30 18:46 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-29 13:49 [PATCH 0/5] drm/i915: Fix pre-g4x GPU reset, again ville.syrjala
2017-06-29 13:49 ` [PATCH 1/5] drm/atomic: Refactor drm_atomic_state_realloc_connectors() ville.syrjala
2017-06-30 13:15 ` Daniel Vetter
2017-06-29 13:49 ` [PATCH 2/5] drm/atomic: Introduce drm_atomic_helper_duplicate_committed_state() ville.syrjala
2017-06-29 13:49 ` [PATCH 3/5] drm/i915% Store vma gtt offset in plane state ville.syrjala
2017-06-29 13:49 ` [PATCH 4/5] drm/i915: Refactor __intel_atomic_commit_tail() ville.syrjala
2017-06-29 13:49 ` [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore ville.syrjala
2017-06-29 13:56 ` Chris Wilson
2017-06-29 14:36 ` [PATCH v4 " ville.syrjala
2017-06-29 14:36 ` ville.syrjala
2017-06-29 17:57 ` Chris Wilson
2017-06-29 19:26 ` Ville Syrjälä
2017-06-29 19:26 ` Ville Syrjälä
2017-06-30 13:35 ` Daniel Vetter
2017-06-30 13:35 ` Daniel Vetter
2017-06-30 13:53 ` Ville Syrjälä
2017-06-30 13:53 ` Ville Syrjälä
2017-06-30 15:30 ` Daniel Vetter
2017-06-30 15:30 ` Daniel Vetter
2017-06-30 15:39 ` Chris Wilson
2017-06-30 15:39 ` Chris Wilson
2017-07-03 8:03 ` Daniel Vetter
2017-07-03 8:03 ` Daniel Vetter
2017-07-03 8:16 ` Chris Wilson
2017-07-03 16:42 ` Daniel Vetter
2017-07-03 16:42 ` Daniel Vetter
2017-06-30 13:25 ` [PATCH " Daniel Vetter
2017-06-30 13:25 ` Daniel Vetter
2017-06-30 13:30 ` Daniel Vetter
2017-06-30 13:30 ` Daniel Vetter
2017-06-30 15:44 ` Ville Syrjälä
2017-06-30 15:44 ` Ville Syrjälä
2017-06-30 18:23 ` Daniel Vetter
2017-06-30 18:46 ` Ville Syrjälä [this message]
2017-06-30 18:46 ` Ville Syrjälä
2017-07-03 7:55 ` Daniel Vetter
2017-07-03 7:55 ` Daniel Vetter
2017-07-03 9:30 ` Ville Syrjälä
2017-07-03 9:30 ` Ville Syrjälä
2017-07-03 16:48 ` Daniel Vetter
2017-07-03 16:48 ` Daniel Vetter
2017-06-29 14:24 ` ✓ Fi.CI.BAT: success for drm/i915: Fix pre-g4x GPU reset, again Patchwork
2017-06-29 15:07 ` Ville Syrjälä
2017-06-29 14:55 ` ✓ Fi.CI.BAT: success for drm/i915: Fix pre-g4x GPU reset, again (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170630184636.GA12629@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.