From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CA7B6C77B75 for ; Wed, 19 Apr 2023 22:18:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5244A10E209; Wed, 19 Apr 2023 22:18:58 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7E73410E209; Wed, 19 Apr 2023 22:18:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681942737; x=1713478737; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=dgtZoPSehg0rGKYSmzf/Nz8EKN5hbsq5xniElu8LJBE=; b=Ys7rWV/ABrSApbchzd5bdSt7REMC0AXIHbmi3lmM9Roxmf0sU6JNTfs6 kti3xInM5cYSijiixCWM8UGh5igLZ86HddSTqOARpICqWAizLK1Rk1wnn 1ntkoPLla6iUmy3hIeYvA8KkJI084gL2teiuf3U4oSykImuoXaiw5TJbb 2WlyXSpYJLLwdOUEyYELY9xPgyTU/oZeeubUb3wFGKuLdjAMwOP7kxufj 2nuMoVS/AipQENIphy/OaJqSgME+LLp4AysmpGEgRnFqtu6b4wDinWKqY cnKXCrWfch84LvG/jGBSRvsmKhmcnEZ/Ome5vSNAyUTGei8Z2OcCX6oMy Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="325906981" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="325906981" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 15:15:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="756253843" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="756253843" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.251.7.25]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 15:15:36 -0700 Date: Wed, 19 Apr 2023 15:10:44 -0700 Message-ID: <871qkfo74r.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Tvrtko Ursulin In-Reply-To: <340d7a5f-0b38-3c40-77b8-ab825a7b5fef@linux.intel.com> References: <20230410223509.3593109-1-ashutosh.dixit@intel.com> <20230410223509.3593109-4-ashutosh.dixit@intel.com> <340d7a5f-0b38-3c40-77b8-ab825a7b5fef@linux.intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Rodrigo Vivi Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Wed, 19 Apr 2023 06:21:27 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > On 10/04/2023 23:35, Ashutosh Dixit wrote: > > Instead of erroring out when GuC reset is in progress, block waiting for > > GuC reset to complete which is a more reasonable uapi behavior. > > > > v2: Avoid race between wake_up_all and waiting for wakeup (Rodrigo) > > > > Signed-off-by: Ashutosh Dixit > > --- > > drivers/gpu/drm/i915/i915_hwmon.c | 38 +++++++++++++++++++++++++++---- > > 1 file changed, 33 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c > > index 9ab8971679fe3..8471a667dfc71 100644 > > --- a/drivers/gpu/drm/i915/i915_hwmon.c > > +++ b/drivers/gpu/drm/i915/i915_hwmon.c > > @@ -51,6 +51,7 @@ struct hwm_drvdata { > > char name[12]; > > int gt_n; > > bool reset_in_progress; > > + wait_queue_head_t waitq; > > }; > > struct i915_hwmon { > > @@ -395,16 +396,41 @@ hwm_power_max_read(struct hwm_drvdata *ddat, long *val) > > static int > > hwm_power_max_write(struct hwm_drvdata *ddat, long val) > > { > > +#define GUC_RESET_TIMEOUT msecs_to_jiffies(2000) > > + > > + int ret = 0, timeout = GUC_RESET_TIMEOUT; > > Patch looks good to me Great, thanks :) > apart that I am not sure what is the purpose of the timeout? This is just > the sysfs write path or has more callers? It is just the sysfs path, but the sysfs is accessed also by the oneAPI stack (Level 0). In the initial version I also didn't have the timeout thinking that the app can send a signal to the blocked thread to unblock it. I introduced the timeout after Rodrigo brought it up and I am now thinking maybe it's better to have the timeout in the driver since the app has no knowledge of how long GuC resets can take. But I can remove it if you think it's not needed. > If the > former perhaps it would be better to just use interruptible everything > (mutex and sleep) and wait for as long as it takes or until user presses > Ctrl-C? Now we are not holding the mutexes for long, just long enough do register rmw's. So not holding the mutex across GuC reset as we were originally. Therefore I am thinking mutex_lock_interruptible is not needed? The sleep is already interruptible (TASK_INTERRUPTIBLE). Anyway please let me know if you think we need to change anything. Thanks. -- Ashutosh > > struct i915_hwmon *hwmon = ddat->hwmon; > > intel_wakeref_t wakeref; > > - int ret = 0; > > + DEFINE_WAIT(wait); > > u32 nval; > > - mutex_lock(&hwmon->hwmon_lock); > > - if (hwmon->ddat.reset_in_progress) { > > - ret = -EAGAIN; > > - goto unlock; > > + /* Block waiting for GuC reset to complete when needed */ > > + for (;;) { > > + mutex_lock(&hwmon->hwmon_lock); > > + > > + prepare_to_wait(&ddat->waitq, &wait, TASK_INTERRUPTIBLE); > > + > > + if (!hwmon->ddat.reset_in_progress) > > + break; > > + > > + if (signal_pending(current)) { > > + ret = -EINTR; > > + break; > > + } > > + > > + if (!timeout) { > > + ret = -ETIME; > > + break; > > + } > > + > > + mutex_unlock(&hwmon->hwmon_lock); > > + > > + timeout = schedule_timeout(timeout); > > } > > + finish_wait(&ddat->waitq, &wait); > > + if (ret) > > + goto unlock; > > + > > wakeref = intel_runtime_pm_get(ddat->uncore->rpm); > > /* Disable PL1 limit and verify, because the limit cannot be > > disabled on all platforms */ > > @@ -508,6 +534,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) > > intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit, > > PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0); > > hwmon->ddat.reset_in_progress = false; > > + wake_up_all(&hwmon->ddat.waitq); > > mutex_unlock(&hwmon->hwmon_lock); > > } > > @@ -784,6 +811,7 @@ void i915_hwmon_register(struct drm_i915_private *i915) > > ddat->uncore = &i915->uncore; > > snprintf(ddat->name, sizeof(ddat->name), "i915"); > > ddat->gt_n = -1; > > + init_waitqueue_head(&ddat->waitq); > > for_each_gt(gt, i915, i) { > > ddat_gt = hwmon->ddat_gt + i; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5C153C77B7C for ; Wed, 19 Apr 2023 22:18:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 82CD710EB0C; Wed, 19 Apr 2023 22:18:58 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7E73410E209; Wed, 19 Apr 2023 22:18:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681942737; x=1713478737; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=dgtZoPSehg0rGKYSmzf/Nz8EKN5hbsq5xniElu8LJBE=; b=Ys7rWV/ABrSApbchzd5bdSt7REMC0AXIHbmi3lmM9Roxmf0sU6JNTfs6 kti3xInM5cYSijiixCWM8UGh5igLZ86HddSTqOARpICqWAizLK1Rk1wnn 1ntkoPLla6iUmy3hIeYvA8KkJI084gL2teiuf3U4oSykImuoXaiw5TJbb 2WlyXSpYJLLwdOUEyYELY9xPgyTU/oZeeubUb3wFGKuLdjAMwOP7kxufj 2nuMoVS/AipQENIphy/OaJqSgME+LLp4AysmpGEgRnFqtu6b4wDinWKqY cnKXCrWfch84LvG/jGBSRvsmKhmcnEZ/Ome5vSNAyUTGei8Z2OcCX6oMy Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="325906981" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="325906981" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 15:15:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="756253843" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="756253843" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.251.7.25]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 15:15:36 -0700 Date: Wed, 19 Apr 2023 15:10:44 -0700 Message-ID: <871qkfo74r.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Tvrtko Ursulin Subject: Re: [PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete In-Reply-To: <340d7a5f-0b38-3c40-77b8-ab825a7b5fef@linux.intel.com> References: <20230410223509.3593109-1-ashutosh.dixit@intel.com> <20230410223509.3593109-4-ashutosh.dixit@intel.com> <340d7a5f-0b38-3c40-77b8-ab825a7b5fef@linux.intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Rodrigo Vivi Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Wed, 19 Apr 2023 06:21:27 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > On 10/04/2023 23:35, Ashutosh Dixit wrote: > > Instead of erroring out when GuC reset is in progress, block waiting for > > GuC reset to complete which is a more reasonable uapi behavior. > > > > v2: Avoid race between wake_up_all and waiting for wakeup (Rodrigo) > > > > Signed-off-by: Ashutosh Dixit > > --- > > drivers/gpu/drm/i915/i915_hwmon.c | 38 +++++++++++++++++++++++++++---- > > 1 file changed, 33 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c > > index 9ab8971679fe3..8471a667dfc71 100644 > > --- a/drivers/gpu/drm/i915/i915_hwmon.c > > +++ b/drivers/gpu/drm/i915/i915_hwmon.c > > @@ -51,6 +51,7 @@ struct hwm_drvdata { > > char name[12]; > > int gt_n; > > bool reset_in_progress; > > + wait_queue_head_t waitq; > > }; > > struct i915_hwmon { > > @@ -395,16 +396,41 @@ hwm_power_max_read(struct hwm_drvdata *ddat, long *val) > > static int > > hwm_power_max_write(struct hwm_drvdata *ddat, long val) > > { > > +#define GUC_RESET_TIMEOUT msecs_to_jiffies(2000) > > + > > + int ret = 0, timeout = GUC_RESET_TIMEOUT; > > Patch looks good to me Great, thanks :) > apart that I am not sure what is the purpose of the timeout? This is just > the sysfs write path or has more callers? It is just the sysfs path, but the sysfs is accessed also by the oneAPI stack (Level 0). In the initial version I also didn't have the timeout thinking that the app can send a signal to the blocked thread to unblock it. I introduced the timeout after Rodrigo brought it up and I am now thinking maybe it's better to have the timeout in the driver since the app has no knowledge of how long GuC resets can take. But I can remove it if you think it's not needed. > If the > former perhaps it would be better to just use interruptible everything > (mutex and sleep) and wait for as long as it takes or until user presses > Ctrl-C? Now we are not holding the mutexes for long, just long enough do register rmw's. So not holding the mutex across GuC reset as we were originally. Therefore I am thinking mutex_lock_interruptible is not needed? The sleep is already interruptible (TASK_INTERRUPTIBLE). Anyway please let me know if you think we need to change anything. Thanks. -- Ashutosh > > struct i915_hwmon *hwmon = ddat->hwmon; > > intel_wakeref_t wakeref; > > - int ret = 0; > > + DEFINE_WAIT(wait); > > u32 nval; > > - mutex_lock(&hwmon->hwmon_lock); > > - if (hwmon->ddat.reset_in_progress) { > > - ret = -EAGAIN; > > - goto unlock; > > + /* Block waiting for GuC reset to complete when needed */ > > + for (;;) { > > + mutex_lock(&hwmon->hwmon_lock); > > + > > + prepare_to_wait(&ddat->waitq, &wait, TASK_INTERRUPTIBLE); > > + > > + if (!hwmon->ddat.reset_in_progress) > > + break; > > + > > + if (signal_pending(current)) { > > + ret = -EINTR; > > + break; > > + } > > + > > + if (!timeout) { > > + ret = -ETIME; > > + break; > > + } > > + > > + mutex_unlock(&hwmon->hwmon_lock); > > + > > + timeout = schedule_timeout(timeout); > > } > > + finish_wait(&ddat->waitq, &wait); > > + if (ret) > > + goto unlock; > > + > > wakeref = intel_runtime_pm_get(ddat->uncore->rpm); > > /* Disable PL1 limit and verify, because the limit cannot be > > disabled on all platforms */ > > @@ -508,6 +534,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) > > intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit, > > PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0); > > hwmon->ddat.reset_in_progress = false; > > + wake_up_all(&hwmon->ddat.waitq); > > mutex_unlock(&hwmon->hwmon_lock); > > } > > @@ -784,6 +811,7 @@ void i915_hwmon_register(struct drm_i915_private *i915) > > ddat->uncore = &i915->uncore; > > snprintf(ddat->name, sizeof(ddat->name), "i915"); > > ddat->gt_n = -1; > > + init_waitqueue_head(&ddat->waitq); > > for_each_gt(gt, i915, i) { > > ddat_gt = hwmon->ddat_gt + i;