From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <john.c.harrison@intel.com>,
"Belgaumkar, Vinay" <vinay.belgaumkar@intel.com>,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost
Date: Fri, 6 May 2022 08:18:49 +0100 [thread overview]
Message-ID: <1d15f38e-c3d8-a521-4a15-50341dae5000@linux.intel.com> (raw)
In-Reply-To: <032467b7-8794-882a-e45f-6e9d85a716df@intel.com>
On 05/05/2022 19:36, John Harrison wrote:
> On 5/5/2022 10:21, Belgaumkar, Vinay wrote:
>> On 5/5/2022 5:13 AM, Tvrtko Ursulin wrote:
>>> On 05/05/2022 06:40, Vinay Belgaumkar wrote:
>>>> SLPC min/max frequency updates require H2G calls. We are seeing
>>>> timeouts when GuC channel is backed up and it is unable to respond
>>>> in a timely fashion causing warnings and affecting CI.
>>>
>>> Is it the "Unable to force min freq" error? Do you have a link to the
>>> GitLab issue to add to commit message?
>> We don't have a specific error for this one, but have seen similar
>> issues with other H2G which are blocking.
>>>
>>>> This is seen when waitboosting happens during a stress test.
>>>> this patch updates the waitboost path to use a non-blocking
>>>> H2G call instead, which returns as soon as the message is
>>>> successfully transmitted.
>>>
>>> AFAIU with this approach, when CT channel is congested, you instead
>>> achieve silent dropping of the waitboost request, right?
>> We are hoping it makes it, but just not waiting for it to complete.
> We are not 'hoping it makes it'. We know for a fact that it will make
> it. We just don't know when. The issue is not about whether the
> waitboost request itself gets dropped/lost it is about the ack that
> comes back. The GuC will process the message and it will send an ack.
> It's just a question of whether the i915 driver has given up waiting for
> it yet. And if it has, then you get the initial 'timed out waiting for
> ack' followed by a later 'got unexpected ack' message.
>
> Whereas, if we make the call asynchronous, there is no ack. i915 doesn't
> bother waiting and it won't get surprised later.
>
> Also, note that this is only an issue when GuC itself is backed up.
> Normally that requires the creation/destruction of large numbers of
> contexts in rapid succession (context management is about the slowest
> thing we do with GuC). Some of the IGTs and selftests do that with
> thousands of contexts all at once. Those are generally where we see this
> kind of problem. It would be highly unlikely (but not impossible) to hit
> it in real world usage.
Goto ->
> The general design philosophy of H2G messages is that asynchronous mode
> should be used for everything if at all possible. It is fire and forget
> and will all get processed in the order sent (same as batch buffer
> execution, really). Synchronous messages should only be used when an
> ack/status is absolutely required. E.g. start of day initialisation or
> things like TLB invalidation where we need to know that a cache has been
> cleared/flushed before updating memory from the CPU.
>
> John.
>
>
>>>
>>> It sounds like a potentially important feedback from the field to
>>> lose so easily. How about you added drm_notice to the worker when it
>>> fails?
>>>
>>> Or simply a "one line patch" to replace i915_probe_error (!?) with
>>> drm_notice and keep the blocking behavior. (I have no idea what is
>>> the typical time to drain the CT buffer, and so to decide whether
>>> waiting or dropping makes more sense for effectiveness of waitboosting.)
>>>
>>> Or since the congestion /should not/ happen in production, then the
>>> argument is why complicate with more code, in which case going with
>>> one line patch is an easy way forward?
Here. Where I did hint I understood the "should not happen in production
angle".
So statement is GuC is congested in processing requests, but the h2g
buffer is not congested so no chance intel_guc_send_nb() will fail with
no space in that buffer? Sounds a bit un-intuitive.
Anyway, it sounds okay to me to use the non-blocking, but I would like
to see some logging if the unexpected does happen. Hence I was
suggesting the option of adding drm_notice logging if the send fails
from the worker. (Because I think other callers would already propagate
the error, like sysfs.)
err = slpc_force_min_freq(slpc, slpc->boost_freq);
if (!err)
slpc->num_boosts++;
else
drm_notice(... "Failed to send waitboost request (%d)", err);
Something like that.
Regards,
Tvrtko
>> Even if we soften the blow here, the actual timeout error occurs in
>> the intel_guc_ct.c code, so we cannot hide that error anyways. Making
>> this call non-blocking will achieve both things.
>>
>> Thanks,
>>
>> Vinay.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 38
>>>> ++++++++++++++++-----
>>>> 1 file changed, 30 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> index 1db833da42df..c852f73cf521 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> @@ -98,6 +98,30 @@ static u32 slpc_get_state(struct intel_guc_slpc
>>>> *slpc)
>>>> return data->header.global_state;
>>>> }
>>>> +static int guc_action_slpc_set_param_nb(struct intel_guc *guc, u8
>>>> id, u32 value)
>>>> +{
>>>> + u32 request[] = {
>>>> + GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
>>>> + SLPC_EVENT(SLPC_EVENT_PARAMETER_SET, 2),
>>>> + id,
>>>> + value,
>>>> + };
>>>> + int ret;
>>>> +
>>>> + ret = intel_guc_send_nb(guc, request, ARRAY_SIZE(request), 0);
>>>> +
>>>> + return ret > 0 ? -EPROTO : ret;
>>>> +}
>>>> +
>>>> +static int slpc_set_param_nb(struct intel_guc_slpc *slpc, u8 id,
>>>> u32 value)
>>>> +{
>>>> + struct intel_guc *guc = slpc_to_guc(slpc);
>>>> +
>>>> + GEM_BUG_ON(id >= SLPC_MAX_PARAM);
>>>> +
>>>> + return guc_action_slpc_set_param_nb(guc, id, value);
>>>> +}
>>>> +
>>>> static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id,
>>>> u32 value)
>>>> {
>>>> u32 request[] = {
>>>> @@ -208,12 +232,10 @@ static int slpc_force_min_freq(struct
>>>> intel_guc_slpc *slpc, u32 freq)
>>>> */
>>>> with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
>>>> - ret = slpc_set_param(slpc,
>>>> - SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>>>> - freq);
>>>> - if (ret)
>>>> - i915_probe_error(i915, "Unable to force min freq to %u:
>>>> %d",
>>>> - freq, ret);
>>>> + /* Non-blocking request will avoid stalls */
>>>> + ret = slpc_set_param_nb(slpc,
>>>> + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>>>> + freq);
>>>> }
>>>> return ret;
>>>> @@ -231,8 +253,8 @@ static void slpc_boost_work(struct work_struct
>>>> *work)
>>>> */
>>>> mutex_lock(&slpc->lock);
>>>> if (atomic_read(&slpc->num_waiters)) {
>>>> - slpc_force_min_freq(slpc, slpc->boost_freq);
>>>> - slpc->num_boosts++;
>>>> + if (!slpc_force_min_freq(slpc, slpc->boost_freq))
>>>> + slpc->num_boosts++;
>>>> }
>>>> mutex_unlock(&slpc->lock);
>>>> }
>
next prev parent reply other threads:[~2022-05-06 7:18 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-05 5:40 [Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost Vinay Belgaumkar
2022-05-05 6:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-05-05 11:12 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-05-05 12:13 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
2022-05-05 17:21 ` Belgaumkar, Vinay
2022-05-05 18:36 ` John Harrison
2022-05-06 7:18 ` Tvrtko Ursulin [this message]
2022-05-06 16:21 ` Belgaumkar, Vinay
2022-05-06 16:43 ` John Harrison
2022-05-15 5:46 ` Belgaumkar, Vinay
-- strict thread matches above, loose matches on Subject: below --
2022-05-15 6:05 Vinay Belgaumkar
2022-05-16 7:59 ` Jani Nikula
2022-05-16 8:00 ` Jani Nikula
2022-06-07 23:02 ` John Harrison
2022-06-07 23:04 ` John Harrison
2022-06-08 7:58 ` Jani Nikula
2022-06-07 22:29 ` Dixit, Ashutosh
2022-06-07 23:15 ` John Harrison
2022-06-08 17:39 ` Dixit, Ashutosh
2022-06-22 0:26 ` Dixit, Ashutosh
2022-06-22 20:30 ` Belgaumkar, Vinay
2022-06-22 21:28 ` Dixit, Ashutosh
2022-06-23 8:12 ` Tvrtko Ursulin
2022-06-23 0:32 Vinay Belgaumkar
2022-06-23 0:53 ` Dixit, Ashutosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d15f38e-c3d8-a521-4a15-50341dae5000@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=john.c.harrison@intel.com \
--cc=vinay.belgaumkar@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox