public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Michel Thierry <michel.thierry@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
Date: Mon, 30 Oct 2017 21:38:30 -0700	[thread overview]
Message-ID: <7b5cb98c-8deb-a89b-0ab5-9512023ffb93@intel.com> (raw)
In-Reply-To: <150939774411.4195.5772343800096262944@mail.alporthouse.com>

On 30/10/17 14:09, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-30 18:56:15)
>> This patch adds per engine reset and recovery (TDR) support when GuC is
>> used to submit workloads to GPU.
>>
>> In the case of i915 directly submission to ELSP, driver manages hang
>> detection, recovery and resubmission. With GuC submission these tasks
>> are shared between driver and GuC. i915 is still responsible for detecting
>> a hang, and when it does it only requests GuC to reset that Engine. GuC
>> internally manages acquiring forcewake and idling the engine before actually
>> resetting it.
>>
>> Once the reset is successful, i915 takes over again and handles resubmission.
>> The scheduler in i915 knows which requests are pending so after resetting
>> a engine, pending workloads/requests are resubmitted again.
>>
>> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
>> non-guc funtion names.
>>
>> v3: Removed debug message about engine restarting from which request,
>> since the new baseline do it regardless of submission mode. (Chris)
>>
>> v4: Rebase.
>>
>> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
>> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> 
> In your experience, how did our test coverage fare?
> 
> Could you use live_hangcheck effectively? (The drv_selftest would need
> some hand holding to pass along guc options. But for livetesting we
> should probably get to the point of being able to load/unload the guc
> interface so that we cover both execlists and guc.) Did you find
> "gem_exec_whisper --r hang*", did you try gem_concurrent_all?
>   

live_hangcheck runs ok with guc (as long as i915_params.h has guc 
submission enabled). Do you see a benefit on adding an option in 
drv_selftest to override the submission mode? I can add it to my list.

You got me in gem_concurrent_all, I forgot to schedule it a few weeks ago.

>> +/**
>> + * intel_guc_reset_engine() - ask GuC to reset an engine
>> + * @engine:    engine to be reset
>> + */
>> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
>> +{
>> +       struct drm_i915_private *dev_priv = engine->i915;
>> +       struct intel_guc *guc = &dev_priv->guc;
>> +       u32 data[7];
>> +
>> +       GEM_BUG_ON(!guc->execbuf_client);
>> +
>> +       data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
>> +       data[1] = engine->guc_id;
>> +       data[2] = 0;
>> +       data[3] = 0;
>> +       data[4] = 0;
>> +       data[5] = guc->execbuf_client->stage_id;
>> +       data[6] = guc_ggtt_offset(guc->shared_data);
>> +
>> +       return intel_guc_send(guc, data, ARRAY_SIZE(data));
> 
> Is this a synchronous action? We expect that following the completion of
> the reset routine, we are ready to reinit the hw. The same rule needs to
> apply the guc, I think.

Right now the action is synchronous, the fw won't reply to the action 
until all the steps are completed. It also is fast enough, I haven't 
seen it time out (which would be promoted to full reset and reload the 
fw). But, do you have a crystal ball?

> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-10-31  4:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
2017-10-30 21:02   ` Chris Wilson
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-10-30 20:58   ` Chris Wilson
2017-10-30 21:08     ` Michel Thierry
2017-10-30 21:09   ` Chris Wilson
2017-10-31  4:38     ` Michel Thierry [this message]
2017-10-31 10:17       ` Chris Wilson
2017-10-31 22:53   ` [PATCH v6] " Michel Thierry
2017-11-01 13:58     ` Chris Wilson
2017-11-01 20:41       ` Jeff McGee
2017-11-02  8:43         ` Chris Wilson
2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
2017-10-30 21:14   ` Chris Wilson
2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
2017-10-31 10:20   ` Chris Wilson
2017-10-31 20:56     ` Michel Thierry
2017-10-31 21:31       ` Chris Wilson
2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
2017-11-01  0:59 ` ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b5cb98c-8deb-a89b-0ab5-9512023ffb93@intel.com \
    --to=michel.thierry@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox