From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John.C.Harrison@Intel.com, Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
Date: Tue, 22 Feb 2022 10:39:43 +0000 [thread overview]
Message-ID: <1bc7257b-ddb6-1a5b-119f-3ef89cd5e634@linux.intel.com> (raw)
In-Reply-To: <0d0c5a79-1285-0830-3794-e9f0644811a5@linux.intel.com>
On 22/02/2022 09:52, Tvrtko Ursulin wrote:
>
> On 18/02/2022 21:33, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> GuC converts the pre-emption timeout and timeslice quantum values into
>> clock ticks internally. That significantly reduces the point of 32bit
>> overflow. On current platforms, worst case scenario is approximately
>
> Where does 32-bit come from, the GuC side? We already use 64-bits so
> that something to fix to start with. Yep...
>
> ./gt/uc/intel_guc_fwif.h: u32 execution_quantum;
>
> ./gt/uc/intel_guc_submission.c: desc->execution_quantum =
> engine->props.timeslice_duration_ms * 1000;
>
> ./gt/intel_engine_types.h: unsigned long
> timeslice_duration_ms;
>
> timeslice_store/preempt_timeout_store:
> err = kstrtoull(buf, 0, &duration);
>
> So both kconfig and sysfs can already overflow GuC, not only because of
> tick conversion internally but because at backend level nothing was done
> for assigning 64-bit into 32-bit. Or I failed to find where it is handled.
>
>> 110 seconds. Rather than allowing the user to set higher values and
>> then get confused by early timeouts, add limits when setting these
>> values.
>
> Btw who is reviewing GuC patches these days - things have somehow gotten
> pretty quiet in activity and I don't think that's due absence of stuff
> to improve or fix? Asking since I think I noticed a few already which
> you posted and then crickets on the mailing list.
>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++++++++++++++
>> drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++++++++++++++
>> drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 +++++++++
>> 3 files changed, 38 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index e53008b4dd05..2a1e9f36e6f5 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt
>> *gt, enum intel_engine_id id,
>> if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
>> engine->props.preempt_timeout_ms = 0;
>> + /* Cap timeouts to prevent overflow inside GuC */
>> + if (intel_guc_submission_is_wanted(>->uc.guc)) {
>> + if (engine->props.timeslice_duration_ms >
>> GUC_POLICY_MAX_EXEC_QUANTUM_MS) {
>
> Hm "wanted".. There's been too much back and forth on the GuC load
> options over the years to keep track.. intel_engine_uses_guc work sounds
> like would work and read nicer.
>
> And limit to class instead of applying to all engines looks like a miss.
Sorry limit to class does not apply here, I confused this with the last
patch.
Regards,
Tvrtko
>
>> + drm_info(&engine->i915->drm, "Warning, clamping timeslice
>> duration to %d to prevent possibly overflow\n",
>> + GUC_POLICY_MAX_EXEC_QUANTUM_MS);
>> + engine->props.timeslice_duration_ms =
>> GUC_POLICY_MAX_EXEC_QUANTUM_MS;
>
> I am not sure logging such message during driver load is useful. Sounds
> more like a confused driver which starts with one value and then
> overrides itself. I'd just silently set the value appropriate for the
> active backend. Preemption timeout kconfig text already documents the
> fact timeouts can get overriden at runtime depending on platform+engine.
> So maybe just add same text to timeslice kconfig.
>
>> + }
>> +
>> + if (engine->props.preempt_timeout_ms >
>> GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) {
>> + drm_info(&engine->i915->drm, "Warning, clamping
>> pre-emption timeout to %d to prevent possibly overflow\n",
>> + GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS);
>> + engine->props.preempt_timeout_ms =
>> GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS;
>> + }
>> + }
>> +
>> engine->defaults = engine->props; /* never to change again */
>> engine->context_size = intel_engine_context_size(gt,
>> engine->class);
>> diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c
>> b/drivers/gpu/drm/i915/gt/sysfs_engines.c
>> index 967031056202..f57efe026474 100644
>> --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
>> +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
>> @@ -221,6 +221,13 @@ timeslice_store(struct kobject *kobj, struct
>> kobj_attribute *attr,
>> if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT))
>> return -EINVAL;
>> + if (intel_uc_uses_guc_submission(&engine->gt->uc) &&
>> + duration > GUC_POLICY_MAX_EXEC_QUANTUM_MS) {
>> + duration = GUC_POLICY_MAX_EXEC_QUANTUM_MS;
>> + drm_info(&engine->i915->drm, "Warning, clamping timeslice
>> duration to %lld to prevent possibly overflow\n",
>> + duration);
>> + }
>
> I would suggest to avoid duplicated clamping logic. Maybe hide the all
> backend logic into the helpers then, like maybe:
>
> d = intel_engine_validate_timeslice/preempt_timeout(engine, duration);
> if (d != duration)
> return -EINVAL:
>
> Returning -EINVAL would be equivalent to existing behaviour:
>
> if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT))
> return -EINVAL;
>
> That way userspace has explicit notification and read-back is identical
> to written in value. From engine setup you can just call the helper
> silently.
>
>> +
>> WRITE_ONCE(engine->props.timeslice_duration_ms, duration);
>> if (execlists_active(&engine->execlists))
>> @@ -325,6 +332,13 @@ preempt_timeout_store(struct kobject *kobj,
>> struct kobj_attribute *attr,
>> if (timeout > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT))
>> return -EINVAL;
>> + if (intel_uc_uses_guc_submission(&engine->gt->uc) &&
>> + timeout > GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) {
>> + timeout = GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS;
>> + drm_info(&engine->i915->drm, "Warning, clamping pre-emption
>> timeout to %lld to prevent possibly overflow\n",
>> + timeout);
>> + }
>> +
>> WRITE_ONCE(engine->props.preempt_timeout_ms, timeout);
>> if (READ_ONCE(engine->execlists.pending[0]))
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>> index 6a4612a852e2..ad131092f8df 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>> @@ -248,6 +248,15 @@ struct guc_lrc_desc {
>> #define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000
>> +/*
>> + * GuC converts the timeout to clock ticks internally. Different
>> platforms have
>> + * different GuC clocks. Thus, the maximum value before overflow is
>> platform
>> + * dependent. Current worst case scenario is about 110s. So, limit to
>> 100s to be
>> + * safe.
>> + */
>> +#define GUC_POLICY_MAX_EXEC_QUANTUM_MS (100 * 1000)
>> +#define GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS (100 * 1000)
>
> Most important question -
> how will we know/notice if/when new GuC arrives where these timeouts
> would still overflow? Can this be queried somehow at runtime or where
> does the limit comes from? How is GuC told about it? Set in some field
> and it just allows too large values silently break things?
>
> Regards,
>
> Tvrtko
>
>> +
>> struct guc_policies {
>> u32 submission_queue_depth[GUC_MAX_ENGINE_CLASSES];
>> /* In micro seconds. How much time to allow before DPC
>> processing is
next prev parent reply other threads:[~2022-02-22 10:39 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-18 21:33 [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads John.C.Harrison
2022-02-18 21:33 ` [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow John.C.Harrison
2022-02-22 9:52 ` Tvrtko Ursulin
2022-02-22 10:39 ` Tvrtko Ursulin [this message]
2022-02-23 2:11 ` John Harrison
2022-02-23 12:13 ` Tvrtko Ursulin
2022-02-23 19:03 ` John Harrison
2022-02-24 9:59 ` Tvrtko Ursulin
2022-02-24 19:19 ` John Harrison
2022-02-24 19:51 ` John Harrison
2022-02-25 17:44 ` Tvrtko Ursulin
2022-02-25 17:06 ` Tvrtko Ursulin
2022-02-25 17:39 ` John Harrison
2022-02-28 16:11 ` Tvrtko Ursulin
2022-02-28 18:32 ` John Harrison
2022-03-01 10:50 ` Tvrtko Ursulin
2022-03-01 19:57 ` John Harrison
2022-03-02 9:20 ` Tvrtko Ursulin
2022-03-02 18:07 ` John Harrison
2022-02-23 0:52 ` Ceraolo Spurio, Daniele
2022-02-23 2:15 ` John Harrison
2022-02-18 21:33 ` [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts John.C.Harrison
2022-02-22 11:19 ` Tvrtko Ursulin
2022-02-23 2:45 ` John Harrison
2022-02-23 13:58 ` Tvrtko Ursulin
2022-02-23 20:00 ` John Harrison
2022-02-24 11:41 ` Tvrtko Ursulin
2022-02-24 19:45 ` John Harrison
2022-02-25 18:14 ` Tvrtko Ursulin
2022-02-25 18:48 ` John Harrison
2022-02-28 17:12 ` Tvrtko Ursulin
2022-02-28 18:55 ` John Harrison
2022-03-01 12:09 ` Tvrtko Ursulin
2022-03-01 20:59 ` John Harrison
2022-03-02 11:07 ` Tvrtko Ursulin
2022-03-02 17:55 ` John Harrison
2022-03-03 9:55 ` Tvrtko Ursulin
2022-03-03 19:09 ` John Harrison
2022-03-04 12:36 ` Tvrtko Ursulin
2022-02-18 21:33 ` [Intel-gfx] [PATCH 3/3] drm/i915: Improve long running OCL w/a for GuC submission John.C.Harrison
2022-02-19 2:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Improve anti-pre-emption w/a for compute workloads Patchwork
2022-02-19 3:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-02-22 9:53 ` [Intel-gfx] [PATCH 0/3] " Tvrtko Ursulin
2022-02-23 2:22 ` John Harrison
2022-02-23 12:00 ` Tvrtko Ursulin
2022-02-24 20:02 ` John Harrison
2022-02-25 16:36 ` Tvrtko Ursulin
2022-02-25 17:11 ` John Harrison
2022-02-25 17:39 ` Tvrtko Ursulin
2022-02-25 18:01 ` John Harrison
2022-02-25 18:29 ` Tvrtko Ursulin
2022-02-25 19:03 ` John Harrison
2022-02-28 15:32 ` Tvrtko Ursulin
2022-02-28 19:17 ` John Harrison
2022-03-02 11:21 ` Tvrtko Ursulin
2022-03-02 17:40 ` John Harrison
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1bc7257b-ddb6-1a5b-119f-3ef89cd5e634@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=DRI-Devel@Lists.FreeDesktop.Org \
--cc=Intel-GFX@Lists.FreeDesktop.Org \
--cc=John.C.Harrison@Intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox