Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Belgaumkar, Vinay" <vinay.belgaumkar@intel.com>
To: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware
Date: Fri, 24 Mar 2023 17:06:33 -0700	[thread overview]
Message-ID: <7b24abc4-30df-227c-bb6c-04de96e59f40@intel.com> (raw)
In-Reply-To: <87sfdtload.wl-ashutosh.dixit@intel.com>


On 3/24/2023 4:31 PM, Dixit, Ashutosh wrote:
> On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote:
> Hi Vinay,
>
> Thanks for the review. Comments inline below.
Sorry about asking the same questions all over again :) Didn't look at 
previous versions.
>
>> On 3/15/2023 8:59 PM, Ashutosh Dixit wrote:
>>> On dGfx, the PL1 power limit being enabled and set to a low value results
>>> in a low GPU operating freq. It also negates the freq raise operation which
>>> is done before GuC firmware load. As a result GuC firmware load can time
>>> out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power
>>> limit was enabled and set to a low value). Therefore disable the PL1 power
>>> limit when allowed by HW when loading GuC firmware.
>> v3 label missing in subject.
>>> v2:
>>>    - Take mutex (to disallow writes to power1_max) across GuC reset/fw load
>>>    - Add hwm_power_max_restore to error return code path
>>>
>>> v3 (Jani N):
>>>    - Add/remove explanatory comments
>>>    - Function renames
>>>    - Type corrections
>>>    - Locking annotation
>>>
>>> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
>>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 +++++++
>>>    drivers/gpu/drm/i915/i915_hwmon.c     | 39 +++++++++++++++++++++++++++
>>>    drivers/gpu/drm/i915/i915_hwmon.h     |  7 +++++
>>>    3 files changed, 55 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>>> index 4ccb4be4c9cba..aa8e35a5636a0 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>>> @@ -18,6 +18,7 @@
>>>    #include "intel_uc.h"
>>>      #include "i915_drv.h"
>>> +#include "i915_hwmon.h"
>>>      static const struct intel_uc_ops uc_ops_off;
>>>    static const struct intel_uc_ops uc_ops_on;
>>> @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 	struct intel_guc *guc = &uc->guc;
>>> 	struct intel_huc *huc = &uc->huc;
>>> 	int ret, attempts;
>>> +	bool pl1en;
>> Init to 'false' here
> See next comment.
>
>>
>>> 		GEM_BUG_ON(!intel_uc_supports_guc(uc));
>>> 	GEM_BUG_ON(!intel_uc_wants_guc(uc));
>>> @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 	else
>>> 		attempts = 1;
>>>    +	/* Disable a potentially low PL1 power limit to allow freq to be
>>> raised */
>>> +	i915_hwmon_power_max_disable(gt->i915, &pl1en);
>>> +
>>> 	intel_rps_raise_unslice(&uc_to_gt(uc)->rps);
>>> 		while (attempts--) {
>>> @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 		intel_rps_lower_unslice(&uc_to_gt(uc)->rps);
>>> 	}
>>>    +	i915_hwmon_power_max_restore(gt->i915, pl1en);
>>> +
>>> 	guc_info(guc, "submission %s\n", str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
>>> 	guc_info(guc, "SLPC %s\n", str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
>>>    @@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 	/* Return GT back to RPn */
>>> 	intel_rps_lower_unslice(&uc_to_gt(uc)->rps);
>>>    +	i915_hwmon_power_max_restore(gt->i915, pl1en);
>> if (pl1en)
>>
>>      i915_hwmon_power_max_enable().
> IMO it's better not to have checks in the main __uc_init_hw() function (if
> we do this we'll need to add 2 checks in __uc_init_hw()). If you really
> want we could do something like this inside
> i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I
> am not making any changes.
ok.
>
> (I can send a patch with the changes if you want to take a look but IMO it
> will add more logic/code but without real benefits (it will save a rmw if
> the limit was already disabled, but IMO this code is called so infrequently
> (only during GuC resets) as to not have any significant impact)).
>
>>> +
>>> 	__uc_sanitize(uc);
>>> 		if (!ret) {
>>> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
>>> index ee63a8fd88fc1..769b5bda4d53f 100644
>>> --- a/drivers/gpu/drm/i915/i915_hwmon.c
>>> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
>>> @@ -444,6 +444,45 @@ hwm_power_write(struct hwm_drvdata *ddat, u32 attr, int chan, long val)
>>> 	}
>>>    }
>>>    +void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool
>>> *old)
>> Shouldn't we call this i915_hwmon_package_pl1_disable()?
> I did think of using "pl1" in the function name but then decided to retain
> "power_max" because other hwmon functions for PL1 limit also use
> "power_max" (hwm_power_max_read/hwm_power_max_write) and currently
> "hwmon_power_max" is mapped to the PL1 limit. So "power_max" is used to
> show that all these functions deal with the PL1 power limit.
>
> There is a comment in __uc_init_hw() explaining "power_max" means the PL1
> power limit.
ok.
>
>>> +	__acquires(i915->hwmon->hwmon_lock)
>>> +{
>>> +	struct i915_hwmon *hwmon = i915->hwmon;
>>> +	intel_wakeref_t wakeref;
>>> +	u32 r;
>>> +
>>> +	if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
>>> +		return;
>>> +
>>> +	/* Take mutex to prevent concurrent hwm_power_max_write */
>>> +	mutex_lock(&hwmon->hwmon_lock);
>>> +
>>> +	with_intel_runtime_pm(hwmon->ddat.uncore->rpm, wakeref)
>>> +		r = intel_uncore_rmw(hwmon->ddat.uncore,
>>> +				     hwmon->rg.pkg_rapl_limit,
>>> +				     PKG_PWR_LIM_1_EN, 0);
>> Most of this code (lock and rmw parts) is already inside static void
>> hwm_locked_with_pm_intel_uncore_rmw() , can we reuse that here?
> This was the case in v1 of the patch:
>
> https://patchwork.freedesktop.org/patch/526393/?series=115003&rev=1
>
> But now this cannot be done because if you notice we acquire the mutex in
> i915_hwmon_power_max_disable() and release the mutex in
> i915_hwmon_power_max_restore().
>
> I explained the reason why this the mutex is handled this way in my reply
> to Jani Nikula here:
>
> https://patchwork.freedesktop.org/patch/526598/?series=115003&rev=2
>
> Quoting below:
>
> ```
>>> +	/* hwmon_lock mutex is unlocked in hwm_power_max_restore */
>> Not too happy about that... any better ideas?
> Afais, taking the mutex is the only fully correct solution (when we disable
> the power limit, userspace can go re-enable it). Examples of partly
> incorrect solutions (which don't take the mutex) include:
>
> a. Don't take the mutex, don't do anything, ignore any changes to the value
>     if it has changed during GuC reset/fw load (just overwrite the changed
>     value). Con: changed value is lost.
>
> b. Detect if the value has changed (the limit has been re-enabled) after we
>     have disabled the limit and in that case skip restoring the value. But
>     then someone can say why do we allow enabling the PL1 limit since we
>     want to disable it.
>
> Both these are very unlikely scenarios so they might work. But I would
> first like to explore if holding a mutex across GuC reset is prolebmatic
> since that is /the/ correct solution. But if anyone comes up with a reason
> why that cannot be done we can look at these other not completely correct
> options.

Well, one reason is that this is adding a lot of duplicate/non-reusable 
code needlessly. If it gets re-used elsewhere, that could lead to some 
weird situations where the lock could be held for an extended period of 
time and introduce dependencies. Also, how/why would the user modify 
this PL1 during guc load? The sysfs interfaces are not even ready at 
this point? Even if we consider this during a resume, the terminal will 
not be available to the user.

Thanks,

Vinay.

> ```
>
>>> +
>>> +	*old = !!(r & PKG_PWR_LIM_1_EN);
>>> +}
>>> +
>>> +void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old)
>>> +	__releases(i915->hwmon->hwmon_lock)
>> We can just call this i915_hwmon_power_max_enable() and call whenever the
>> old value was actually enabled. That way, we have proper mirror functions.
> As I explained that would mean adding two checks in the main __uc_init_hw()
> function which I am trying to avoid. So we have disable/restore pair.
>
>>> +{
>>> +	struct i915_hwmon *hwmon = i915->hwmon;
>>> +	intel_wakeref_t wakeref;
>>> +
>>> +	if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
>>> +		return;
>>> +
>>> +	with_intel_runtime_pm(hwmon->ddat.uncore->rpm, wakeref)
>>> +		intel_uncore_rmw(hwmon->ddat.uncore,
>>> +				 hwmon->rg.pkg_rapl_limit,
>>> +				 PKG_PWR_LIM_1_EN,
>>> +				 old ? PKG_PWR_LIM_1_EN : 0);
>> 3rd param should be 0 here, else we will end up clearing other bits.
> No see intel_uncore_rmw(), it will only clear the PKG_PWR_LIM_1_EN bit, so
> the code here is correct. intel_uncore_rmw() does:
>
>          val = (old & ~clear) | set;
Ok, just confusing, since you are also setting it with the 4th param.
>
> So for now I am not making any changes, if you feel strongly about
> something one way or another let me know. Anyway these comments should help
> you understand the patch better so take a look and we can go from there.
>
> Thanks.
> --
> Ashutosh
>
>>> +
>>> +	mutex_unlock(&hwmon->hwmon_lock);
>>> +}
>>> +
>>>    static umode_t
>>>    hwm_energy_is_visible(const struct hwm_drvdata *ddat, u32 attr)
>>>    {
>>> diff --git a/drivers/gpu/drm/i915/i915_hwmon.h b/drivers/gpu/drm/i915/i915_hwmon.h
>>> index 7ca9cf2c34c96..0fcb7de844061 100644
>>> --- a/drivers/gpu/drm/i915/i915_hwmon.h
>>> +++ b/drivers/gpu/drm/i915/i915_hwmon.h
>>> @@ -7,14 +7,21 @@
>>>    #ifndef __I915_HWMON_H__
>>>    #define __I915_HWMON_H__
>>>    +#include <linux/types.h>
>>> +
>>>    struct drm_i915_private;
>>> +struct intel_gt;
>>>      #if IS_REACHABLE(CONFIG_HWMON)
>>>    void i915_hwmon_register(struct drm_i915_private *i915);
>>>    void i915_hwmon_unregister(struct drm_i915_private *i915);
>>> +void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old);
>>> +void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old);
>>>    #else
>>>    static inline void i915_hwmon_register(struct drm_i915_private *i915) { };
>>>    static inline void i915_hwmon_unregister(struct drm_i915_private *i915) { };
>>> +static inline void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old) { };
>>> +static inline void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) { };
>>>    #endif
>>>      #endif /* __I915_HWMON_H__ */

  reply	other threads:[~2023-03-25  0:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16  3:59 [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware Ashutosh Dixit
2023-03-16  4:13 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/guc: Disable PL1 power limit when loading GuC firmware (rev3) Patchwork
2023-03-16  4:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-03-16  9:05 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-03-24 18:15 ` [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware Belgaumkar, Vinay
2023-03-24 23:31   ` Dixit, Ashutosh
2023-03-25  0:06     ` Belgaumkar, Vinay [this message]
2023-03-27 16:57       ` Dixit, Ashutosh
2023-03-26 11:52     ` Rodrigo Vivi
2023-03-27 16:58       ` Dixit, Ashutosh
2023-03-27 17:47         ` Rodrigo Vivi
2023-03-28  9:14           ` Tvrtko Ursulin
2023-04-06  4:52             ` Dixit, Ashutosh
2023-04-06  4:50           ` Dixit, Ashutosh
  -- strict thread matches above, loose matches on Subject: below --
2023-03-11  0:33 Ashutosh Dixit
2023-03-11 20:46 ` Dixit, Ashutosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b24abc4-30df-227c-bb6c-04de96e59f40@intel.com \
    --to=vinay.belgaumkar@intel.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox