From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
intel-gfx@lists.freedesktop.org, stable@vger.kernel.org,
dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore
Date: Fri, 30 Jun 2017 16:53:19 +0300 [thread overview]
Message-ID: <20170630135319.GX12629@intel.com> (raw)
In-Reply-To: <20170630133503.4ykvlvtj4wd4nacf@phenom.ffwll.local>
On Fri, Jun 30, 2017 at 03:35:03PM +0200, Daniel Vetter wrote:
> On Thu, Jun 29, 2017 at 10:26:08PM +0300, Ville Syrjälä wrote:
> > On Thu, Jun 29, 2017 at 06:57:30PM +0100, Chris Wilson wrote:
> > > Quoting ville.syrjala@linux.intel.com (2017-06-29 15:36:42)
> > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >
> > > > Introduce an rw_semaphore to protect the display commits. All normal
> > > > commits use down_read() and hence can proceed in parallel, but GPU reset
> > > > will use down_write() making sure no other commits are in progress when
> > > > we have to pull the plug on the display engine on pre-g4x platforms.
> > > > There are no modeset/gem locks taken inside __intel_atomic_commit_tail()
> > > > itself, and we wait for all dependencies before the down_read(), and
> > > > thus there is no chance of deadlocks with this scheme.
> > >
> > > This matches what I thought should be done (I didn't think of using
> > > rwsem just a mutex, nice touch). The point I got stuck on was what
> > > should be done after the reset? I expected another modeset to return the
> > > state back or otherwise the inflight would get confused?
> >
> > I guess that can happen. For instance, if we have a crtc_enable() in flight,
> > and then we do a reset before it gets committed we would end up doing
> > crtc_enable() twice in a row without a crtc_disable in between. For page
> > flips and such this shouldn't be a big deal in general.
>
> atomic commits are ordered. You have to wait for the previous ones to
> complete before you do a new one. If you don't do that, then all hell
> breaks loose.
What we're effectively doing here is inserting two new commits in
the middle of the timeline, one to disable everything, and another
one to re-enable everything. At the end of the the re-enable we would
want the hardware state should match exactly what was there before
the disable, hence any commits still in the timeline should work
correctly. That is, their old_state will match the hardware state
when it comes time to commit them.
But we can only do that properly after we start to track the committed
state. Without that tracking we can get into trouble wrt. the
hardware state not matching the old state when it comes time to
commit the new state.
>
> What you really can't do with atomic (without rewriting everything once
> more) is cancel a commit. Pre-atomic we could do that on gen4 since the
> mmio flips died with the gpu, but that's the one design change we need to
> cope with (plus TDR insisting it can't force-complete requests anymore).
>
> > > > During reset we should be recommiting the state that was committed last.
> > > > But for now we'll settle for recommiting the last state for each object.
> > >
> > > Ah, I guess that explains the above. What is the complication with
> > > restoring the current state as opposed to the next state?
> >
> > Well the main thing is just tracking which is the current state. That
> > just needs refactoring the .atomic_duplicate_state() calling convention
> > across the whole tree so that we can then duplicate the committed state
> > rather than the latest state.
> >
> > Also due to the commit_hw_done() being potentially done after the
> > modeset locks have been dropped, I don't think we can be certain
> > of it getting called in the same order as swap_state(), hence
> > when we track the committed state in commit_hw_done() we'll have
> > to have some way to figure out if our new state is in fact the
> > latest committed state for each object or if the calls got
> > reordered. We don't insert any dependencies between two commits
> > unless they touch the same active crtc, thus this reordering
> > seems very much possible. Dunno if we should add some way to add
> > such dependeencies whenever the same object is part of two otherwise
> > independent commits, or if we just want to try and work with the
> > reordered calls. My idea for the latter was some kind of seqno/age
> > counter on the object states that allows me to recongnize which
> > state is more recent. The object states aren't refcounted so hanging
> > on to the wrong pointer could cause an oops the next time we have to
> > perform a GPU reset.
>
> Atomic commits are strongly ordered on a given CRTC, so stuff can't be
> out-of-order within one. Across them the idea was to just add more CRTC
> states into the atomic commit to make sure stuff is ordered correctly.
And atomic commits not touching the same crtc will not be ordered in any
way. Thus if they touch the same object (eg. disabled plane or connector)
we can't currently tell if the commit_hw_done() calls happened in the same
order as the swap_state() calls for that particular object.
--
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
stable@vger.kernel.org, Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore
Date: Fri, 30 Jun 2017 16:53:19 +0300 [thread overview]
Message-ID: <20170630135319.GX12629@intel.com> (raw)
In-Reply-To: <20170630133503.4ykvlvtj4wd4nacf@phenom.ffwll.local>
On Fri, Jun 30, 2017 at 03:35:03PM +0200, Daniel Vetter wrote:
> On Thu, Jun 29, 2017 at 10:26:08PM +0300, Ville Syrj�l� wrote:
> > On Thu, Jun 29, 2017 at 06:57:30PM +0100, Chris Wilson wrote:
> > > Quoting ville.syrjala@linux.intel.com (2017-06-29 15:36:42)
> > > > From: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > >
> > > > Introduce an rw_semaphore to protect the display commits. All normal
> > > > commits use down_read() and hence can proceed in parallel, but GPU reset
> > > > will use down_write() making sure no other commits are in progress when
> > > > we have to pull the plug on the display engine on pre-g4x platforms.
> > > > There are no modeset/gem locks taken inside __intel_atomic_commit_tail()
> > > > itself, and we wait for all dependencies before the down_read(), and
> > > > thus there is no chance of deadlocks with this scheme.
> > >
> > > This matches what I thought should be done (I didn't think of using
> > > rwsem just a mutex, nice touch). The point I got stuck on was what
> > > should be done after the reset? I expected another modeset to return the
> > > state back or otherwise the inflight would get confused?
> >
> > I guess that can happen. For instance, if we have a crtc_enable() in flight,
> > and then we do a reset before it gets committed we would end up doing
> > crtc_enable() twice in a row without a crtc_disable in between. For page
> > flips and such this shouldn't be a big deal in general.
>
> atomic commits are ordered. You have to wait for the previous ones to
> complete before you do a new one. If you don't do that, then all hell
> breaks loose.
What we're effectively doing here is inserting two new commits in
the middle of the timeline, one to disable everything, and another
one to re-enable everything. At the end of the the re-enable we would
want the hardware state should match exactly what was there before
the disable, hence any commits still in the timeline should work
correctly. That is, their old_state will match the hardware state
when it comes time to commit them.
But we can only do that properly after we start to track the committed
state. Without that tracking we can get into trouble wrt. the
hardware state not matching the old state when it comes time to
commit the new state.
>
> What you really can't do with atomic (without rewriting everything once
> more) is cancel a commit. Pre-atomic we could do that on gen4 since the
> mmio flips died with the gpu, but that's the one design change we need to
> cope with (plus TDR insisting it can't force-complete requests anymore).
>
> > > > During reset we should be recommiting the state that was committed last.
> > > > But for now we'll settle for recommiting the last state for each object.
> > >
> > > Ah, I guess that explains the above. What is the complication with
> > > restoring the current state as opposed to the next state?
> >
> > Well the main thing is just tracking which is the current state. That
> > just needs refactoring the .atomic_duplicate_state() calling convention
> > across the whole tree so that we can then duplicate the committed state
> > rather than the latest state.
> >
> > Also due to the commit_hw_done() being potentially done after the
> > modeset locks have been dropped, I don't think we can be certain
> > of it getting called in the same order as swap_state(), hence
> > when we track the committed state in commit_hw_done() we'll have
> > to have some way to figure out if our new state is in fact the
> > latest committed state for each object or if the calls got
> > reordered. We don't insert any dependencies between two commits
> > unless they touch the same active crtc, thus this reordering
> > seems very much possible. Dunno if we should add some way to add
> > such dependeencies whenever the same object is part of two otherwise
> > independent commits, or if we just want to try and work with the
> > reordered calls. My idea for the latter was some kind of seqno/age
> > counter on the object states that allows me to recongnize which
> > state is more recent. The object states aren't refcounted so hanging
> > on to the wrong pointer could cause an oops the next time we have to
> > perform a GPU reset.
>
> Atomic commits are strongly ordered on a given CRTC, so stuff can't be
> out-of-order within one. Across them the idea was to just add more CRTC
> states into the atomic commit to make sure stuff is ordered correctly.
And atomic commits not touching the same crtc will not be ordered in any
way. Thus if they touch the same object (eg. disabled plane or connector)
we can't currently tell if the commit_hw_done() calls happened in the same
order as the swap_state() calls for that particular object.
--
Ville Syrj�l�
Intel OTC
next prev parent reply other threads:[~2017-06-30 13:53 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-29 13:49 [PATCH 0/5] drm/i915: Fix pre-g4x GPU reset, again ville.syrjala
2017-06-29 13:49 ` [PATCH 1/5] drm/atomic: Refactor drm_atomic_state_realloc_connectors() ville.syrjala
2017-06-30 13:15 ` Daniel Vetter
2017-06-29 13:49 ` [PATCH 2/5] drm/atomic: Introduce drm_atomic_helper_duplicate_committed_state() ville.syrjala
2017-06-29 13:49 ` [PATCH 3/5] drm/i915% Store vma gtt offset in plane state ville.syrjala
2017-06-29 13:49 ` [PATCH 4/5] drm/i915: Refactor __intel_atomic_commit_tail() ville.syrjala
2017-06-29 13:49 ` [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore ville.syrjala
2017-06-29 13:56 ` Chris Wilson
2017-06-29 14:36 ` [PATCH v4 " ville.syrjala
2017-06-29 14:36 ` ville.syrjala
2017-06-29 17:57 ` Chris Wilson
2017-06-29 19:26 ` Ville Syrjälä
2017-06-29 19:26 ` Ville Syrjälä
2017-06-30 13:35 ` Daniel Vetter
2017-06-30 13:35 ` Daniel Vetter
2017-06-30 13:53 ` Ville Syrjälä [this message]
2017-06-30 13:53 ` Ville Syrjälä
2017-06-30 15:30 ` Daniel Vetter
2017-06-30 15:30 ` Daniel Vetter
2017-06-30 15:39 ` Chris Wilson
2017-06-30 15:39 ` Chris Wilson
2017-07-03 8:03 ` Daniel Vetter
2017-07-03 8:03 ` Daniel Vetter
2017-07-03 8:16 ` Chris Wilson
2017-07-03 16:42 ` Daniel Vetter
2017-07-03 16:42 ` Daniel Vetter
2017-06-30 13:25 ` [PATCH " Daniel Vetter
2017-06-30 13:25 ` Daniel Vetter
2017-06-30 13:30 ` Daniel Vetter
2017-06-30 13:30 ` Daniel Vetter
2017-06-30 15:44 ` Ville Syrjälä
2017-06-30 15:44 ` Ville Syrjälä
2017-06-30 18:23 ` Daniel Vetter
2017-06-30 18:46 ` Ville Syrjälä
2017-06-30 18:46 ` Ville Syrjälä
2017-07-03 7:55 ` Daniel Vetter
2017-07-03 7:55 ` Daniel Vetter
2017-07-03 9:30 ` Ville Syrjälä
2017-07-03 9:30 ` Ville Syrjälä
2017-07-03 16:48 ` Daniel Vetter
2017-07-03 16:48 ` Daniel Vetter
2017-06-29 14:24 ` ✓ Fi.CI.BAT: success for drm/i915: Fix pre-g4x GPU reset, again Patchwork
2017-06-29 15:07 ` Ville Syrjälä
2017-06-29 14:55 ` ✓ Fi.CI.BAT: success for drm/i915: Fix pre-g4x GPU reset, again (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170630135319.GX12629@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=daniel.vetter@ffwll.ch \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.