Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: John Harrison <john.c.harrison@intel.com>
To: "Belgaumkar, Vinay" <vinay.belgaumkar@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	<intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost
Date: Thu, 5 May 2022 11:36:29 -0700	[thread overview]
Message-ID: <032467b7-8794-882a-e45f-6e9d85a716df@intel.com> (raw)
In-Reply-To: <8c8dea74-aa2b-44c1-e1d7-1f613c05bf32@intel.com>

On 5/5/2022 10:21, Belgaumkar, Vinay wrote:
> On 5/5/2022 5:13 AM, Tvrtko Ursulin wrote:
>> On 05/05/2022 06:40, Vinay Belgaumkar wrote:
>>> SLPC min/max frequency updates require H2G calls. We are seeing
>>> timeouts when GuC channel is backed up and it is unable to respond
>>> in a timely fashion causing warnings and affecting CI.
>>
>> Is it the "Unable to force min freq" error? Do you have a link to the 
>> GitLab issue to add to commit message?
> We don't have a specific error for this one, but have seen similar 
> issues with other H2G which are blocking.
>>
>>> This is seen when waitboosting happens during a stress test.
>>> this patch updates the waitboost path to use a non-blocking
>>> H2G call instead, which returns as soon as the message is
>>> successfully transmitted.
>>
>> AFAIU with this approach, when CT channel is congested, you instead 
>> achieve silent dropping of the waitboost request, right?
> We are hoping it makes it, but just not waiting for it to complete.
We are not 'hoping it makes it'. We know for a fact that it will make 
it. We just don't know when. The issue is not about whether the 
waitboost request itself gets dropped/lost it is about the ack that 
comes back. The GuC will process the message and it will send an ack. 
It's just a question of whether the i915 driver has given up waiting for 
it yet. And if it has, then you get the initial 'timed out waiting for 
ack' followed by a later 'got unexpected ack' message.

Whereas, if we make the call asynchronous, there is no ack. i915 doesn't 
bother waiting and it won't get surprised later.

Also, note that this is only an issue when GuC itself is backed up. 
Normally that requires the creation/destruction of large numbers of 
contexts in rapid succession (context management is about the slowest 
thing we do with GuC). Some of the IGTs and selftests do that with 
thousands of contexts all at once. Those are generally where we see this 
kind of problem. It would be highly unlikely (but not impossible) to hit 
it in real world usage.

The general design philosophy of H2G messages is that asynchronous mode 
should be used for everything if at all possible. It is fire and forget 
and will all get processed in the order sent (same as batch buffer 
execution, really). Synchronous messages should only be used when an 
ack/status is absolutely required. E.g. start of day initialisation or 
things like TLB invalidation where we need to know that a cache has been 
cleared/flushed before updating memory from the CPU.

John.


>>
>> It sounds like a potentially important feedback from the field to 
>> lose so easily. How about you added drm_notice to the worker when it 
>> fails?
>>
>> Or simply a "one line patch" to replace i915_probe_error (!?) with 
>> drm_notice and keep the blocking behavior. (I have no idea what is 
>> the typical time to drain the CT buffer, and so to decide whether 
>> waiting or dropping makes more sense for effectiveness of waitboosting.)
>>
>> Or since the congestion /should not/ happen in production, then the 
>> argument is why complicate with more code, in which case going with 
>> one line patch is an easy way forward?
>
> Even if we soften the blow here, the actual timeout error occurs in 
> the intel_guc_ct.c code, so we cannot hide that error anyways. Making 
> this call non-blocking will achieve both things.
>
> Thanks,
>
> Vinay.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 38 
>>> ++++++++++++++++-----
>>>   1 file changed, 30 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>> index 1db833da42df..c852f73cf521 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>> @@ -98,6 +98,30 @@ static u32 slpc_get_state(struct intel_guc_slpc 
>>> *slpc)
>>>       return data->header.global_state;
>>>   }
>>>   +static int guc_action_slpc_set_param_nb(struct intel_guc *guc, u8 
>>> id, u32 value)
>>> +{
>>> +    u32 request[] = {
>>> +        GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
>>> +        SLPC_EVENT(SLPC_EVENT_PARAMETER_SET, 2),
>>> +        id,
>>> +        value,
>>> +    };
>>> +    int ret;
>>> +
>>> +    ret = intel_guc_send_nb(guc, request, ARRAY_SIZE(request), 0);
>>> +
>>> +    return ret > 0 ? -EPROTO : ret;
>>> +}
>>> +
>>> +static int slpc_set_param_nb(struct intel_guc_slpc *slpc, u8 id, 
>>> u32 value)
>>> +{
>>> +    struct intel_guc *guc = slpc_to_guc(slpc);
>>> +
>>> +    GEM_BUG_ON(id >= SLPC_MAX_PARAM);
>>> +
>>> +    return guc_action_slpc_set_param_nb(guc, id, value);
>>> +}
>>> +
>>>   static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, 
>>> u32 value)
>>>   {
>>>       u32 request[] = {
>>> @@ -208,12 +232,10 @@ static int slpc_force_min_freq(struct 
>>> intel_guc_slpc *slpc, u32 freq)
>>>        */
>>>         with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
>>> -        ret = slpc_set_param(slpc,
>>> - SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>>> -                     freq);
>>> -        if (ret)
>>> -            i915_probe_error(i915, "Unable to force min freq to %u: 
>>> %d",
>>> -                     freq, ret);
>>> +        /* Non-blocking request will avoid stalls */
>>> +        ret = slpc_set_param_nb(slpc,
>>> + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>>> +                    freq);
>>>       }
>>>         return ret;
>>> @@ -231,8 +253,8 @@ static void slpc_boost_work(struct work_struct 
>>> *work)
>>>        */
>>>       mutex_lock(&slpc->lock);
>>>       if (atomic_read(&slpc->num_waiters)) {
>>> -        slpc_force_min_freq(slpc, slpc->boost_freq);
>>> -        slpc->num_boosts++;
>>> +        if (!slpc_force_min_freq(slpc, slpc->boost_freq))
>>> +            slpc->num_boosts++;
>>>       }
>>>       mutex_unlock(&slpc->lock);
>>>   }

next prev parent reply	other threads:[~2022-05-05 18:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05  5:40 [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost Vinay Belgaumkar
2022-05-05  6:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-05-05 11:12 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-05-05 12:13 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
2022-05-05 17:21   ` Belgaumkar, Vinay
2022-05-05 18:36     ` John Harrison [this message]
2022-05-06  7:18       ` Tvrtko Ursulin
2022-05-06 16:21         ` Belgaumkar, Vinay
2022-05-06 16:43         ` John Harrison
2022-05-15  5:46           ` Belgaumkar, Vinay
  -- strict thread matches above, loose matches on Subject: below --
2022-05-15  6:05 Vinay Belgaumkar
2022-05-16  7:59 ` Jani Nikula
2022-05-16  8:00   ` Jani Nikula
2022-06-07 23:02   ` John Harrison
2022-06-07 23:04     ` John Harrison
2022-06-08  7:58       ` Jani Nikula
2022-06-07 22:29 ` Dixit, Ashutosh
2022-06-07 23:15   ` John Harrison
2022-06-08 17:39     ` Dixit, Ashutosh
2022-06-22  0:26 ` Dixit, Ashutosh
2022-06-22 20:30   ` Belgaumkar, Vinay
2022-06-22 21:28     ` Dixit, Ashutosh
2022-06-23  8:12       ` Tvrtko Ursulin
2022-06-23  0:32 Vinay Belgaumkar
2022-06-23  0:53 ` Dixit, Ashutosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=032467b7-8794-882a-e45f-6e9d85a716df@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    --cc=vinay.belgaumkar@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox