Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	"Belgaumkar, Vinay" <vinay.belgaumkar@intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost
Date: Fri, 6 May 2022 08:18:49 +0100	[thread overview]
Message-ID: <1d15f38e-c3d8-a521-4a15-50341dae5000@linux.intel.com> (raw)
In-Reply-To: <032467b7-8794-882a-e45f-6e9d85a716df@intel.com>


On 05/05/2022 19:36, John Harrison wrote:
> On 5/5/2022 10:21, Belgaumkar, Vinay wrote:
>> On 5/5/2022 5:13 AM, Tvrtko Ursulin wrote:
>>> On 05/05/2022 06:40, Vinay Belgaumkar wrote:
>>>> SLPC min/max frequency updates require H2G calls. We are seeing
>>>> timeouts when GuC channel is backed up and it is unable to respond
>>>> in a timely fashion causing warnings and affecting CI.
>>>
>>> Is it the "Unable to force min freq" error? Do you have a link to the 
>>> GitLab issue to add to commit message?
>> We don't have a specific error for this one, but have seen similar 
>> issues with other H2G which are blocking.
>>>
>>>> This is seen when waitboosting happens during a stress test.
>>>> this patch updates the waitboost path to use a non-blocking
>>>> H2G call instead, which returns as soon as the message is
>>>> successfully transmitted.
>>>
>>> AFAIU with this approach, when CT channel is congested, you instead 
>>> achieve silent dropping of the waitboost request, right?
>> We are hoping it makes it, but just not waiting for it to complete.
> We are not 'hoping it makes it'. We know for a fact that it will make 
> it. We just don't know when. The issue is not about whether the 
> waitboost request itself gets dropped/lost it is about the ack that 
> comes back. The GuC will process the message and it will send an ack. 
> It's just a question of whether the i915 driver has given up waiting for 
> it yet. And if it has, then you get the initial 'timed out waiting for 
> ack' followed by a later 'got unexpected ack' message.
> 
> Whereas, if we make the call asynchronous, there is no ack. i915 doesn't 
> bother waiting and it won't get surprised later.
> 
> Also, note that this is only an issue when GuC itself is backed up. 
> Normally that requires the creation/destruction of large numbers of 
> contexts in rapid succession (context management is about the slowest 
> thing we do with GuC). Some of the IGTs and selftests do that with 
> thousands of contexts all at once. Those are generally where we see this 
> kind of problem. It would be highly unlikely (but not impossible) to hit 
> it in real world usage.

Goto ->

> The general design philosophy of H2G messages is that asynchronous mode 
> should be used for everything if at all possible. It is fire and forget 
> and will all get processed in the order sent (same as batch buffer 
> execution, really). Synchronous messages should only be used when an 
> ack/status is absolutely required. E.g. start of day initialisation or 
> things like TLB invalidation where we need to know that a cache has been 
> cleared/flushed before updating memory from the CPU.
> 
> John.
> 
> 
>>>
>>> It sounds like a potentially important feedback from the field to 
>>> lose so easily. How about you added drm_notice to the worker when it 
>>> fails?
>>>
>>> Or simply a "one line patch" to replace i915_probe_error (!?) with 
>>> drm_notice and keep the blocking behavior. (I have no idea what is 
>>> the typical time to drain the CT buffer, and so to decide whether 
>>> waiting or dropping makes more sense for effectiveness of waitboosting.)
>>>
>>> Or since the congestion /should not/ happen in production, then the 
>>> argument is why complicate with more code, in which case going with 
>>> one line patch is an easy way forward?

Here. Where I did hint I understood the "should not happen in production 
angle".

So statement is GuC is congested in processing requests, but the h2g 
buffer is not congested so no chance intel_guc_send_nb() will fail with 
no space in that buffer? Sounds a bit un-intuitive.

Anyway, it sounds okay to me to use the non-blocking, but I would like 
to see some logging if the unexpected does happen. Hence I was 
suggesting the option of adding drm_notice logging if the send fails 
from the worker. (Because I think other callers would already propagate 
the error, like sysfs.)

   err = slpc_force_min_freq(slpc, slpc->boost_freq);
   if (!err)
        slpc->num_boosts++;
   else
        drm_notice(... "Failed to send waitboost request (%d)", err);

Something like that.

Regards,

Tvrtko


>> Even if we soften the blow here, the actual timeout error occurs in 
>> the intel_guc_ct.c code, so we cannot hide that error anyways. Making 
>> this call non-blocking will achieve both things.
>>
>> Thanks,
>>
>> Vinay.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 38 
>>>> ++++++++++++++++-----
>>>>   1 file changed, 30 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> index 1db833da42df..c852f73cf521 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> @@ -98,6 +98,30 @@ static u32 slpc_get_state(struct intel_guc_slpc 
>>>> *slpc)
>>>>       return data->header.global_state;
>>>>   }
>>>>   +static int guc_action_slpc_set_param_nb(struct intel_guc *guc, u8 
>>>> id, u32 value)
>>>> +{
>>>> +    u32 request[] = {
>>>> +        GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
>>>> +        SLPC_EVENT(SLPC_EVENT_PARAMETER_SET, 2),
>>>> +        id,
>>>> +        value,
>>>> +    };
>>>> +    int ret;
>>>> +
>>>> +    ret = intel_guc_send_nb(guc, request, ARRAY_SIZE(request), 0);
>>>> +
>>>> +    return ret > 0 ? -EPROTO : ret;
>>>> +}
>>>> +
>>>> +static int slpc_set_param_nb(struct intel_guc_slpc *slpc, u8 id, 
>>>> u32 value)
>>>> +{
>>>> +    struct intel_guc *guc = slpc_to_guc(slpc);
>>>> +
>>>> +    GEM_BUG_ON(id >= SLPC_MAX_PARAM);
>>>> +
>>>> +    return guc_action_slpc_set_param_nb(guc, id, value);
>>>> +}
>>>> +
>>>>   static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, 
>>>> u32 value)
>>>>   {
>>>>       u32 request[] = {
>>>> @@ -208,12 +232,10 @@ static int slpc_force_min_freq(struct 
>>>> intel_guc_slpc *slpc, u32 freq)
>>>>        */
>>>>         with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
>>>> -        ret = slpc_set_param(slpc,
>>>> - SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>>>> -                     freq);
>>>> -        if (ret)
>>>> -            i915_probe_error(i915, "Unable to force min freq to %u: 
>>>> %d",
>>>> -                     freq, ret);
>>>> +        /* Non-blocking request will avoid stalls */
>>>> +        ret = slpc_set_param_nb(slpc,
>>>> + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>>>> +                    freq);
>>>>       }
>>>>         return ret;
>>>> @@ -231,8 +253,8 @@ static void slpc_boost_work(struct work_struct 
>>>> *work)
>>>>        */
>>>>       mutex_lock(&slpc->lock);
>>>>       if (atomic_read(&slpc->num_waiters)) {
>>>> -        slpc_force_min_freq(slpc, slpc->boost_freq);
>>>> -        slpc->num_boosts++;
>>>> +        if (!slpc_force_min_freq(slpc, slpc->boost_freq))
>>>> +            slpc->num_boosts++;
>>>>       }
>>>>       mutex_unlock(&slpc->lock);
>>>>   }
> 

  reply	other threads:[~2022-05-06  7:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05  5:40 [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost Vinay Belgaumkar
2022-05-05  6:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-05-05 11:12 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-05-05 12:13 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
2022-05-05 17:21   ` Belgaumkar, Vinay
2022-05-05 18:36     ` John Harrison
2022-05-06  7:18       ` Tvrtko Ursulin [this message]
2022-05-06 16:21         ` Belgaumkar, Vinay
2022-05-06 16:43         ` John Harrison
2022-05-15  5:46           ` Belgaumkar, Vinay
  -- strict thread matches above, loose matches on Subject: below --
2022-05-15  6:05 Vinay Belgaumkar
2022-05-16  7:59 ` Jani Nikula
2022-05-16  8:00   ` Jani Nikula
2022-06-07 23:02   ` John Harrison
2022-06-07 23:04     ` John Harrison
2022-06-08  7:58       ` Jani Nikula
2022-06-07 22:29 ` Dixit, Ashutosh
2022-06-07 23:15   ` John Harrison
2022-06-08 17:39     ` Dixit, Ashutosh
2022-06-22  0:26 ` Dixit, Ashutosh
2022-06-22 20:30   ` Belgaumkar, Vinay
2022-06-22 21:28     ` Dixit, Ashutosh
2022-06-23  8:12       ` Tvrtko Ursulin
2022-06-23  0:32 Vinay Belgaumkar
2022-06-23  0:53 ` Dixit, Ashutosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d15f38e-c3d8-a521-4a15-50341dae5000@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=vinay.belgaumkar@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox