From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-hwmon-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E536BC0015E
	for <linux-hwmon@archiver.kernel.org>; Tue, 15 Aug 2023 23:21:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234230AbjHOXUf (ORCPT <rfc822;linux-hwmon@archiver.kernel.org>);
        Tue, 15 Aug 2023 19:20:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34262 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240656AbjHOXUa (ORCPT
        <rfc822;linux-hwmon@vger.kernel.org>);
        Tue, 15 Aug 2023 19:20:30 -0400
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6EC3E7A
        for <linux-hwmon@vger.kernel.org>; Tue, 15 Aug 2023 16:20:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1692141629; x=1723677629;
  h=date:message-id:from:to:cc:subject:in-reply-to:
   references:mime-version;
  bh=0UeDMCBIhYbwtHtWPEGSdXTnhki89eBrNah5gHQFmRk=;
  b=TQGaijCch6BCcCq5Nv74nV0Rurt1M+hpxVWQAPMPVgFu7AnnLc7QgKCM
   tWce46qFd7yYKE3A+I0rzZp9f49G0XUFYxuixCIx0/+2YFF5XU3227+0t
   Xxzf8t3KXdg7BcJHpBbhd7IjfrplJc/r9+uOnWCy2OK9O4UP0K5lDItwH
   IyV9SsmlvFOLAe6wX40hV1po0OXj2NciVQVJNJSqMhfaUaZ8Q0kZO6+Jj
   4x3vGwfLuWdzLlveD1MzzKkk5vrStrYI7yyZ/CvCvTmTlUlrFqZLkP/dC
   oxp/mC9mW3HddycqgBzNvRpMjzqhHuWwYbQImRhnqLNegmp0HF2xKUvJf
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="357376591"
X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; 
   d="scan'208";a="357376591"
Received: from orsmga004.jf.intel.com ([10.7.209.38])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 16:20:29 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="857600690"
X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; 
   d="scan'208";a="857600690"
Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.209.138.252])
  by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 16:20:28 -0700
Date:   Tue, 15 Aug 2023 16:20:26 -0700
Message-ID: <87zg2rsxj9.wl-ashutosh.dixit@intel.com>
From:   "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To:     Andi Shyti <andi.shyti@linux.intel.com>
Cc:     Matthew Brost <matthew.brost@intel.com>,
        Badal Nilawar <badal.nilawar@intel.com>,
        <intel-xe@lists.freedesktop.org>, <linux-hwmon@vger.kernel.org>,
        <anshuman.gupta@intel.com>, <linux@roeck-us.net>,
        <riana.tauro@intel.com>
Subject: Re: [PATCH v2 2/6] drm/xe/hwmon: Expose power attributes
In-Reply-To: <ZJ2Qm0UcAidCEArX@ashyti-mobl2.lan>
References: <20230627183043.2024530-1-badal.nilawar@intel.com>  <20230627183043.2024530-3-badal.nilawar@intel.com>      <ZJzNuq/WaxjZ8YH/@DUT025-TGLU.fm.intel.com>     <ZJ2Qm0UcAidCEArX@ashyti-mobl2.lan>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0
 Emacs/29.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
Precedence: bulk
List-ID: <linux-hwmon.vger.kernel.org>
X-Mailing-List: linux-hwmon@vger.kernel.org

On Thu, 29 Jun 2023 07:09:31 -0700, Andi Shyti wrote:
>

Hi Badal/Andi/Matt,

> > > +static int hwm_power_max_write(struct hwm_drvdata *ddat, long value)
> > > +{
> > > +	struct xe_hwmon *hwmon = ddat->hwmon;
> > > +	DEFINE_WAIT(wait);
> > > +	int ret = 0;
> > > +	u32 nval;
> > > +
> > > +	/* Block waiting for GuC reset to complete when needed */
> > > +	for (;;) {
>
> with a do...while() you shouldn't need a for(;;)... your choice,
> not going to beat on that.
>
> > > +		mutex_lock(&hwmon->hwmon_lock);
> > > +
> > > +		prepare_to_wait(&ddat->waitq, &wait, TASK_INTERRUPTIBLE);
> > > +
> > > +		if (!hwmon->ddat.reset_in_progress)
> > > +			break;
> > > +
> > > +		if (signal_pending(current)) {
> > > +			ret = -EINTR;
> > > +			break;
>
> cough! cough! unlock! cough! cough!

Why? It's fine as is.

>
> > > +		}
> > > +
> > > +		mutex_unlock(&hwmon->hwmon_lock);
> > > +
> > > +		schedule();
> > > +	}
> > > +	finish_wait(&ddat->waitq, &wait);
> > > +	if (ret)
> > > +		goto unlock;
> >
> > Anyway to not open code this? We similar in with
> > xe_guc_submit_reset_wait, could we expose a global reset in progress in
> > function which we can expose at the gt level?

I don't know of any way to not open code this which is guaranteed to not
deadlock (not to say there are no other ways).

> >
> > > +
> > > +	xe_device_mem_access_get(gt_to_xe(ddat->gt));
> > > +
> >
> > This certainly is an outer most thing, e.g. doing this under
> > hwmon->hwmon_lock seems dangerous. Again the upper levels of the stack
> > should do the xe_device_mem_access_get, which it does. Just pointing out
> > doing xe_device_mem_access_get/put under a lock isn't a good idea.

Agree, this is the only change we should make to this code.

> >
> > Also the the loop which acquires hwmon->hwmon_lock is confusing too.

Confusing but correct.

> >
> > > +	/* Disable PL1 limit and verify, as limit cannot be disabled on all platforms */
> > > +	if (value == PL1_DISABLE) {
> > > +		process_hwmon_reg(ddat, pkg_rapl_limit, reg_rmw, &nval,
> > > +				  PKG_PWR_LIM_1_EN, 0);
> > > +		process_hwmon_reg(ddat, pkg_rapl_limit, reg_read, &nval,
> > > +				  PKG_PWR_LIM_1_EN, 0);
> > > +
> > > +		if (nval & PKG_PWR_LIM_1_EN)
> > > +			ret = -ENODEV;
> > > +		goto exit;
>
> cough! cough! lock! cough! cough!

Why? It's fine as is.

>
> > > +	}
> > > +
> > > +	/* Computation in 64-bits to avoid overflow. Round to nearest. */
> > > +	nval = DIV_ROUND_CLOSEST_ULL((u64)value << hwmon->scl_shift_power, SF_POWER);
> > > +	nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval);
> > > +
> > > +	process_hwmon_reg(ddat, pkg_rapl_limit, reg_rmw, &nval,
> > > +			  PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
> > > +exit:
> > > +	xe_device_mem_access_put(gt_to_xe(ddat->gt));
> > > +unlock:
> > > +	mutex_unlock(&hwmon->hwmon_lock);
> > > +
>
> mmhhh???... jokes apart this is so error prone that it will
> deadlock as soon as someone will think of editing this file :)

Why is it error prone and how will it deadlock? In fact this
prepare_to_wait/finish_wait pattern is widely used in the kernel (see
e.g. rpm_suspend) and is one of the few patterns guaranteed to not deadlock
(see also 6.2.5 "Advanced Sleeping" in LDD3 if needed). This is the same
code pattern we also implemented in i915 hwm_power_max_write.

In i915 first a scheme which held a mutex across GuC reset was
proposed. But that was then deemed to be risky and this complex scheme was
then implemented. Just to give some history.

Regarding editing the code, it's kernel code involving locking which needs
to be edited carefully, it's all confined to a single (or maybe a couple of
functions), but otherwise yes definitely not to mess around with :)

>
> It worried me already at the first part.
>
> Please, as Matt said, have a more linear locking here.

Afaiac I don't know of any other race-free way to do this other than what
was done in v2 (and in i915). So I want to discard the changes made in v3
and go back to the changes made in v2. If others can suggest other ways
that which they can guarantee are race-free and correct and can R-b that
code, that's fine.

But I can R-b the v2 code (with the only change of moving
xe_device_mem_access_get out of the lock). (Of course I am only talking
about R-b'ing the above scheme, other review comments may be valid).

Badal, also, if there are questions about this scheme, maybe we should move
this to a separate patch as was done in i915? We can just return -EAGAIN as
in 1b44019a93e2.

Thanks.
--
Ashutosh