public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Dave Gordon <david.s.gordon@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/i915/guc: downgrade some DRM_ERROR() messages to DRM_WARN()
Date: Tue, 12 Jul 2016 13:29:54 +0100	[thread overview]
Message-ID: <5784E2C2.7020506@intel.com> (raw)
In-Reply-To: <5784B66B.7010502@linux.intel.com>

On 12/07/16 10:20, Tvrtko Ursulin wrote:
>
> On 11/07/16 19:01, Dave Gordon wrote:
>> Where we're going to continue regardless of the problem, rather than
>> fail, then the message should be a WARNing rather than an ERROR.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_guc_submission.c | 18 +++++++-----------
>>   1 file changed, 7 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 2112e02..e299b64 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -114,10 +114,8 @@ static int host2guc_action(struct intel_guc *guc,
>> u32 *data, u32 len)
>>           if (ret != -ETIMEDOUT)
>>               ret = -EIO;
>>
>> -        DRM_ERROR("GUC: host2guc action 0x%X failed. ret=%d "
>> -                "status=0x%08X response=0x%08X\n",
>> -                data[0], ret, status,
>> -                I915_READ(SOFT_SCRATCH(15)));
>> +        DRM_WARN("Action 0x%X failed; ret=%d status=0x%08X
>> response=0x%08X\n",
>> +             data[0], ret, status, I915_READ(SOFT_SCRATCH(15)));
>
> Hm, this does propagate the error code to the callers some which will
> act and log the failure. Majority won't though - like suspend/resume
> etc. In those cases it feels more like an error than a warning.

It's definitely something that shouldn't happen, so we need to log it; 
and it has to be done at this level because we don't pass enough 
information back to leave it to the caller. But OTOH this layer doesn't 
have enough information to determine just how serious a failure is.

So as a compromise the idea is to log a WARNING here, and then the 
caller can choose to:
1. pass the failure up (until we reach a layer with more context)
2. quietly disregard the failure and continue anyway
3. report an ERROR and fail/abort the process.

That way we should get all the useful information about the root cause 
of something that ends up as an ERROR, while neither ignoring nor being 
too verbose about failures from which we may ultimately recover.

For example, one of the callers (the doorbell h/w initialisation code) 
considers some failures as interesting but not critical (DEBUG level) 
but other instances of the exact same operation are fatal (ERROR).

>>           dev_priv->guc.action_fail += 1;
>>           dev_priv->guc.action_err = ret;
>> @@ -553,8 +551,8 @@ static int guc_ring_doorbell(struct
>> i915_guc_client *gc)
>>           if (db_ret.db_status == GUC_DOORBELL_DISABLED)
>>               break;
>>
>> -        DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
>> -              db_cmp.cookie, db_ret.cookie);
>> +        DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
>> +             db_cmp.cookie, db_ret.cookie);
>
> This one is interesting, error is propagated out a bit but then ignored
> in actual command submission.
>
> If the above message means command will not be submitted error is
> probably more appropriate. Or perhaps we cannot tell if the command was
> submitted or not in this case?

Note that this is inside a retry loop. It shouldn't ever happen, but if 
it does we'll report it and try one more time. If it keeps happening 
(which would require active interference by some other party) we won't 
be able to ring the doorbell and the failure will be propagated back out 
of the GuC submission code. OTOH the caller then ignores it because 
"submission is not allowed to fail" (!)

And yes, it is then undefined as to whether the command has been 
submitted or not. If it hasn't we'll expect a GPU hang later.

.Dave.

>>           /* update the cookie to newly read cookie from GuC */
>>           db_cmp.cookie = db_ret.cookie;
>> @@ -726,8 +724,8 @@ static void guc_init_doorbell_hw(struct intel_guc
>> *guc)
>>       /* Restore to original value */
>>       err = guc_update_doorbell_id(guc, client, db_id);
>>       if (err)
>> -        DRM_ERROR("Failed to restore doorbell to %d, err %d\n",
>> -            db_id, err);
>> +        DRM_WARN("Failed to restore doorbell to %d, err %d\n",
>> +             db_id, err);
>>
>>       for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
>>           i915_reg_t drbreg = GEN8_DRBREGL(i);
>> @@ -819,8 +817,6 @@ static void guc_init_doorbell_hw(struct intel_guc
>> *guc)
>>       return client;
>>
>>   err:
>> -    DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
>> -
>>       guc_client_free(dev_priv, client);
>>       return NULL;
>>   }
>> @@ -998,7 +994,7 @@ int i915_guc_submission_enable(struct
>> drm_i915_private *dev_priv)
>>                     GUC_CTX_PRIORITY_KMD_NORMAL,
>>                     dev_priv->kernel_context);
>>       if (!client) {
>> -        DRM_ERROR("Failed to create execbuf guc_client\n");
>> +        DRM_ERROR("Failed to create normal GuC client!\n");
>>           return -ENOMEM;
>>       }
>>
>>
>
> Regards,
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-07-12 12:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 18:01 [PATCH 1/3] drm: extra printk() wrapper macros Dave Gordon
2016-07-11 18:01 ` [PATCH 2/3] drm/i915/guc: downgrade some DRM_ERROR() messages to DRM_WARN() Dave Gordon
2016-07-12  9:20   ` Tvrtko Ursulin
2016-07-12  9:27     ` [Intel-gfx] " Chris Wilson
2016-07-12  9:37       ` Tvrtko Ursulin
2016-07-12 12:29     ` Dave Gordon [this message]
2016-07-11 18:01 ` [PATCH 3/3] drm/i915/guc: revisit GuC loader message levels Dave Gordon
2016-07-12  9:26   ` [Intel-gfx] " Tvrtko Ursulin
2016-07-12 15:11     ` Dave Gordon
2016-07-12  9:06 ` [PATCH 1/3] drm: extra printk() wrapper macros Tvrtko Ursulin
2016-07-12 13:28   ` Dave Gordon
2016-07-12 13:51     ` [Intel-gfx] " Tvrtko Ursulin
2016-07-12 11:06 ` Eric Engestrom
2016-07-12 14:25 ` Daniel Vetter
2016-07-12 14:53   ` Dave Gordon
2016-07-12 14:59     ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5784E2C2.7020506@intel.com \
    --to=david.s.gordon@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox