Re: [Intel-gfx] [PATCH 25/27] drm/i915/guc: Handle errors in multi-lrc requests

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: John Harrison <john.c.harrison@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	daniel.vetter@ffwll.ch, tony.ye@intel.com, zhengguo.xu@intel.com
Subject: Re: [Intel-gfx] [PATCH 25/27] drm/i915/guc: Handle errors in multi-lrc requests
Date: Wed, 29 Sep 2021 13:58:19 -0700	[thread overview]
Message-ID: <20210929205819.GA10003@jons-linux-dev-box> (raw)
In-Reply-To: <9c27ba5e-cf18-006f-b63a-908a863d52f7@intel.com>

On Wed, Sep 29, 2021 at 01:44:10PM -0700, John Harrison wrote:
> On 8/20/2021 15:44, Matthew Brost wrote:
> > If an error occurs in the front end when multi-lrc requests are getting
> > generated we need to skip these in the backend but we still need to
> > emit the breadcrumbs seqno. An issues arrises because with multi-lrc
> arrises -> arises
> 

Yep.

> > breadcrumbs there is a handshake between the parent and children to make
> > forwad progress. If all the requests are not present this handshake
> forwad -> forward
> 

Yep.

> > doesn't work. To work around this, if multi-lrc request has an error we
> > skip the handshake but still emit the breadcrumbs seqno.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++++++++++++++++++-
> >   1 file changed, 58 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 2ef38557b0f0..61e737fd1eee 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -3546,8 +3546,8 @@ static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
> >   }
> >   static u32 *
> > -emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
> > -						 u32 *cs)
> > +__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
> > +						   u32 *cs)
> >   {
> >   	struct intel_context *ce = rq->context;
> >   	u8 i;
> > @@ -3575,6 +3575,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
> >   				  get_children_go_addr(ce),
> >   				  0);
> > +	return cs;
> > +}
> > +
> > +/*
> > + * If this true, a submission of multi-lrc requests had an error and the
> > + * requests need to be skipped. The front end (execuf IOCTL) should've called
> > + * i915_request_skip which squashes the BB but we still need to emit the fini
> > + * breadrcrumbs seqno write. At this point we don't know how many of the
> > + * requests in the multi-lrc submission were generated so we can't do the
> > + * handshake between the parent and children (e.g. if 4 requests should be
> > + * generated but 2nd hit an error only 1 would be seen by the GuC backend).
> > + * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
> > + * has occurred on any of the requests in submission / relationship.
> > + */
> > +static inline bool skip_handshake(struct i915_request *rq)
> > +{
> > +	return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
> > +}
> > +
> > +static u32 *
> > +emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
> > +						 u32 *cs)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +
> > +	GEM_BUG_ON(!intel_context_is_parent(ce));
> > +
> > +	if (unlikely(skip_handshake(rq))) {
> > +		memset(cs, 0, sizeof(u32) *
> > +		       (ce->engine->emit_fini_breadcrumb_dw - 6));
> > +		cs += ce->engine->emit_fini_breadcrumb_dw - 6;
> Why -6? There are 12 words about to be written. Indeed the value of
> emit_..._dw is '12 + 4*num_children'. This should only be skipping over the
> 4*children, right? As it stands, it will skip all but the last six words,
> then write an extra twelve words and thus overflow the reservation by six.
> Unless I am totally confused?
> 

Let me decode the length:

'Wait on children' (in __emit_fini_breadcrumb_parent_no_preempt_mid_batch) = 4 * num_children
'Turn on preemption' (in __emit_fini_breadcrumb_parent_no_preempt_mid_batch) = 2
'Tell children go' (in __emit_fini_breadcrumb_parent_no_preempt_mid_batch) = 4
'Emit fini breadcrumb' (in emit_fini_breadcrumb_child_no_preempt_mid_batch) = 4
'User interrupt' (in emit_fini_breadcrumb_child_no_preempt_mid_batch) = 2

So for a total (emit_fini_breadcrumb_dw) we have '12 + 4 * num_children'

We want skip everything in __emit_fini_breadcrumb_parent_no_preempt_mid_batch, so that is
'6 + 4 * num_children' or 'emit_fini_breadcrumb_dw - 6'

Make sense?

> I assume there is some reason why the amount of data written must exactly
> match the space reserved? It's a while since I've looked at the ring buffer
> code!
> 

I think it because the ring space is reserved at request creation time
but the fini breadcrumbs are not written until submission time.

> Seems like it would be clearer to not split the semaphore writes out but
> have them right next to the skip code that is meant to replicate them but
> with no-ops.
>

I guess that works too, I personally like the way it is but if you
insist I can change it.

> > +	} else {
> > +		cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
> > +	}
> > +
> >   	/* Emit fini breadcrumb */
> >   	cs = gen8_emit_ggtt_write(cs,
> >   				  rq->fence.seqno,
> > @@ -3591,7 +3626,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
> >   }
> >   static u32 *
> > -emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs)
> > +__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
> > +						  u32 *cs)
> >   {
> >   	struct intel_context *ce = rq->context;
> > @@ -3617,6 +3653,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs
> >   	*cs++ = get_children_go_addr(ce->parent);
> >   	*cs++ = 0;
> > +	return cs;
> > +}
> > +
> > +static u32 *
> > +emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
> > +						u32 *cs)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +
> > +	GEM_BUG_ON(!intel_context_is_child(ce));
> > +
> > +	if (unlikely(skip_handshake(rq))) {
> > +		memset(cs, 0, sizeof(u32) *
> > +		       (ce->engine->emit_fini_breadcrumb_dw - 6));
> > +		cs += ce->engine->emit_fini_breadcrumb_dw - 6;
> > +	} else {
> > +		cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs);
> > +	}
> Same points as above - why -6 not -12 and would be clearer to keep the
> no-ops and the writes adjacent.
>

Same as above we are NOP the length of
__emit_fini_breadcrumb_child_no_preempt_mid_batch and still want the
emit breadcrumbs below.

Matt

 
> John.
> 
> > +
> >   	/* Emit fini breadcrumb */
> >   	cs = gen8_emit_ggtt_write(cs,
> >   				  rq->fence.seqno,
>

next prev parent reply	other threads:[~2021-09-29 21:03 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20 22:44 [PATCH 00/27] Parallel submission aka multi-bb execbuf Matthew Brost
2021-08-20 22:44 ` [PATCH 01/27] drm/i915/guc: Squash Clean up GuC CI failures, simplify locking, and kernel DOC Matthew Brost
2021-08-20 22:44 ` [PATCH 02/27] drm/i915/guc: Allow flexible number of context ids Matthew Brost
2021-09-09 22:13   ` [Intel-gfx] " John Harrison
2021-09-10  0:14     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 03/27] drm/i915/guc: Connect the number of guc_ids to debugfs Matthew Brost
2021-09-09 22:16   ` [Intel-gfx] " John Harrison
2021-09-10  0:16     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 04/27] drm/i915/guc: Take GT PM ref when deregistering context Matthew Brost
2021-09-09 22:28   ` [Intel-gfx] " John Harrison
2021-09-10  0:21     ` Matthew Brost
2021-09-13  9:55   ` Tvrtko Ursulin
2021-09-13 17:12     ` Matthew Brost
2021-09-14  8:41       ` Tvrtko Ursulin
2021-08-20 22:44 ` [PATCH 05/27] drm/i915: Add GT PM unpark worker Matthew Brost
2021-09-09 22:36   ` [Intel-gfx] " John Harrison
2021-09-10  0:34     ` Matthew Brost
2021-09-10  8:36   ` Tvrtko Ursulin
2021-09-10 20:09     ` Matthew Brost
2021-09-13 10:33       ` Tvrtko Ursulin
2021-09-13 17:20         ` Matthew Brost
2021-08-20 22:44 ` [PATCH 06/27] drm/i915/guc: Take engine PM when a context is pinned with GuC submission Matthew Brost
2021-09-09 22:46   ` [Intel-gfx] " John Harrison
2021-09-10  0:41     ` Matthew Brost
2021-09-13 22:26       ` John Harrison
2021-09-14  1:12         ` Matthew Brost
2021-08-20 22:44 ` [PATCH 07/27] drm/i915/guc: Don't call switch_to_kernel_context " Matthew Brost
2021-09-09 22:51   ` [Intel-gfx] " John Harrison
2021-09-13 16:54     ` Matthew Brost
2021-09-13 22:38       ` John Harrison
2021-09-14  5:02         ` Matthew Brost
2021-09-13 16:55     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 08/27] drm/i915: Add logical engine mapping Matthew Brost
2021-09-10 11:12   ` [Intel-gfx] " Tvrtko Ursulin
2021-09-10 19:49     ` Matthew Brost
2021-09-13  9:24       ` Tvrtko Ursulin
2021-09-13 16:50         ` Matthew Brost
2021-09-14  8:34           ` Tvrtko Ursulin
2021-09-14 18:04             ` Matthew Brost
2021-09-15  8:24               ` Tvrtko Ursulin
2021-09-15 16:58                 ` Matthew Brost
2021-09-16  8:31                   ` Tvrtko Ursulin
2021-08-20 22:44 ` [PATCH 09/27] drm/i915: Expose logical engine instance to user Matthew Brost
2021-09-13 23:06   ` [Intel-gfx] " John Harrison
2021-09-14  1:08     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 10/27] drm/i915/guc: Introduce context parent-child relationship Matthew Brost
2021-09-13 23:19   ` [Intel-gfx] " John Harrison
2021-09-14  1:18     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 11/27] drm/i915/guc: Implement parallel context pin / unpin functions Matthew Brost
2021-08-20 22:44 ` [PATCH 12/27] drm/i915/guc: Add multi-lrc context registration Matthew Brost
2021-09-15 19:21   ` [Intel-gfx] " John Harrison
2021-09-15 19:31     ` Matthew Brost
2021-09-15 20:23       ` John Harrison
2021-09-15 20:33         ` Matthew Brost
2021-08-20 22:44 ` [PATCH 13/27] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts Matthew Brost
2021-09-15 19:24   ` [Intel-gfx] " John Harrison
2021-09-15 19:34     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 14/27] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids Matthew Brost
2021-09-15 20:04   ` [Intel-gfx] " John Harrison
2021-09-15 20:55     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 15/27] drm/i915/guc: Implement multi-lrc submission Matthew Brost
2021-08-21 14:04   ` kernel test robot
2021-08-22  2:18   ` kernel test robot
2021-09-20 21:48   ` [Intel-gfx] " John Harrison
2021-09-22 16:25     ` Matthew Brost
2021-09-22 20:15       ` John Harrison
2021-09-23  2:44         ` Matthew Brost
2021-08-20 22:44 ` [PATCH 16/27] drm/i915/guc: Insert submit fences between requests in parent-child relationship Matthew Brost
2021-09-20 21:57   ` [Intel-gfx] " John Harrison
2021-08-20 22:44 ` [PATCH 17/27] drm/i915/guc: Implement multi-lrc reset Matthew Brost
2021-09-20 22:44   ` [Intel-gfx] " John Harrison
2021-09-22 16:16     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 18/27] drm/i915/guc: Update debugfs for GuC multi-lrc Matthew Brost
2021-09-20 22:48   ` [Intel-gfx] " John Harrison
2021-09-21 19:13     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 19/27] drm/i915: Fix bug in user proto-context creation that leaked contexts Matthew Brost
2021-09-20 22:57   ` [Intel-gfx] " John Harrison
2021-09-21 14:49     ` Tvrtko Ursulin
2021-09-21 19:28       ` Matthew Brost
2021-09-21 19:28     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 20/27] drm/i915/guc: Connect UAPI to GuC multi-lrc interface Matthew Brost
2021-08-29  4:00   ` [Intel-gfx] " kernel test robot
2021-08-29 19:59   ` kernel test robot
2021-09-21  0:09   ` John Harrison
2021-09-22 16:38     ` Matthew Brost
2021-08-20 22:44 ` [PATCH 21/27] drm/i915/doc: Update parallel submit doc to point to i915_drm.h Matthew Brost
2021-09-21  0:12   ` [Intel-gfx] " John Harrison
2021-08-20 22:44 ` [PATCH 22/27] drm/i915/guc: Add basic GuC multi-lrc selftest Matthew Brost
2021-09-28 20:47   ` [Intel-gfx] " John Harrison
2021-08-20 22:44 ` [PATCH 23/27] drm/i915/guc: Implement no mid batch preemption for multi-lrc Matthew Brost
2021-09-10 11:25   ` [Intel-gfx] " Tvrtko Ursulin
2021-09-10 20:49     ` Matthew Brost
2021-09-13 10:52       ` Tvrtko Ursulin
2021-09-28 22:20   ` John Harrison
2021-09-28 22:33     ` Matthew Brost
2021-09-28 23:33       ` John Harrison
2021-09-29  0:22         ` Matthew Brost
2021-08-20 22:44 ` [PATCH 24/27] drm/i915: Multi-BB execbuf Matthew Brost
2021-08-21 19:01   ` [Intel-gfx] " kernel test robot
2021-08-30  3:46   ` kernel test robot
2021-09-30 22:16   ` Matthew Brost
2021-08-20 22:44 ` [PATCH 25/27] drm/i915/guc: Handle errors in multi-lrc requests Matthew Brost
2021-09-29 20:44   ` [Intel-gfx] " John Harrison
2021-09-29 20:58     ` Matthew Brost [this message]
2021-08-20 22:44 ` [PATCH 26/27] drm/i915: Enable multi-bb execbuf Matthew Brost
2021-08-20 22:44 ` [PATCH 27/27] drm/i915/execlists: Weak parallel submission support for execlists Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210929205819.GA10003@jons-linux-dev-box \
    --to=matthew.brost@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=tony.ye@intel.com \
    --cc=zhengguo.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).