From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CA85FC77B61 for ; Mon, 10 Apr 2023 22:43:12 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 57E1210E43B; Mon, 10 Apr 2023 22:43:12 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 40D6410E43B; Mon, 10 Apr 2023 22:43:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681166591; x=1712702591; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=xk0edMKFeflPpIytVjYpBJTZht+BUxxlmyGglSwLMvk=; b=cB98wI5K+NoVQIts2fGn8Ag2EW6BHcXbbnOb7CcrWBaSZUtMdanMa3C7 vyqQkrMKBaRF6m/CMKUx44rxjoDkpvcB32SMk20dICwps5vV0chUDK84p ocJXSJF32tjuXTrmNa864Mqj5jO+i73FPqMhRRbdTwwkj2J1pU+LruZWD HRz+Nt8nV1sMAV/zIlQAqsTmf/vK0uOAbNVM+2tFY4VMSbHEAT293bk7X ZdZtRzoMPYijrmRRLrWCILjEcnZRe5qEoO/KUZslCr2VjqyllUgCjKyCV CPFl7iwLEz8dotlpv9QpbWJdoDqO2NgD5VAWco80CqHIAXosO51mkX80K g==; X-IronPort-AV: E=McAfee;i="6600,9927,10676"; a="342234762" X-IronPort-AV: E=Sophos;i="5.98,333,1673942400"; d="scan'208";a="342234762" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2023 15:43:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10676"; a="638598776" X-IronPort-AV: E=Sophos;i="5.98,333,1673942400"; d="scan'208";a="638598776" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.251.5.8]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2023 15:43:10 -0700 Date: Mon, 10 Apr 2023 15:40:55 -0700 Message-ID: <877cuje4y0.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Rodrigo Vivi In-Reply-To: References: <20230406044522.3108359-1-ashutosh.dixit@intel.com> <20230406044522.3108359-4-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Fri, 07 Apr 2023 04:04:06 -0700, Rodrigo Vivi wrote: > Hi Rodrigo, > On Wed, Apr 05, 2023 at 09:45:22PM -0700, Ashutosh Dixit wrote: > > Instead of erroring out when GuC reset is in progress, block waiting for > > GuC reset to complete which is a more reasonable uapi behavior. > > > > Signed-off-by: Ashutosh Dixit > > --- > > drivers/gpu/drm/i915/i915_hwmon.c | 13 ++++++++++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c > > index 9ab8971679fe3..4343efb48e61b 100644 > > --- a/drivers/gpu/drm/i915/i915_hwmon.c > > +++ b/drivers/gpu/drm/i915/i915_hwmon.c > > @@ -51,6 +51,7 @@ struct hwm_drvdata { > > char name[12]; > > int gt_n; > > bool reset_in_progress; > > + wait_queue_head_t wqh; > > }; > > > > struct i915_hwmon { > > @@ -400,10 +401,15 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) > > int ret = 0; > > u32 nval; > > > > +retry: > > mutex_lock(&hwmon->hwmon_lock); > > if (hwmon->ddat.reset_in_progress) { > > - ret = -EAGAIN; > > - goto unlock; > > + mutex_unlock(&hwmon->hwmon_lock); > > + ret = wait_event_interruptible(ddat->wqh, > > + !hwmon->ddat.reset_in_progress); > > this is indeed very clever! Not clever, see below :/ > my fear is probably due to the lack of knowledge on this wait queue, but > I'm wondering what could go wrong if due to some funny race you enter this > check right after wake_up_all below has passed and then you will be here > indefinitely waiting... You are absolutely right, there is indeed a race in the patch because in the above code when we drop the mutex (mutex_unlock) the wake_up_all can happen before we have queued ourselves for the wake up. Solving this race needs a more complicated prepare_to_wait/finish_wait sequence which I have gone ahead and implemented in patch v2. The v2 code is also a standard code pattern and the pattern I have implemented is basically the same as that in intel_guc_wait_for_pending_msg() in i915 which I liked. I have read in several places (e.g. in the Advanced Sleeping section in https://static.lwn.net/images/pdf/LDD3/ch06.pdf and in kernel documentation for try_to_wake_up()) that this sequence will avoid the race (between schedule() and wake_up()). The crucial difference from the v1 patch is that in v2 the mutex is dropped after we queue ourselves in prepare_to_wait() just before calling schedule_timeout(). > maybe just use the timeout version to be on the safeside and then return the > -EAGAIN on timeout? Also incorporated timeout in the new version. All code paths in the new patch have been tested. Thanks. -- Ashutosh > > + if (ret) > > + return ret; > > + goto retry; > > } > > wakeref = intel_runtime_pm_get(ddat->uncore->rpm); > > > > @@ -426,7 +432,6 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) > > PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval); > > exit: > > intel_runtime_pm_put(ddat->uncore->rpm, wakeref); > > -unlock: > > mutex_unlock(&hwmon->hwmon_lock); > > return ret; > > } > > @@ -508,6 +513,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) > > intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit, > > PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0); > > hwmon->ddat.reset_in_progress = false; > > + wake_up_all(&hwmon->ddat.wqh); > > > > mutex_unlock(&hwmon->hwmon_lock); > > } > > @@ -784,6 +790,7 @@ void i915_hwmon_register(struct drm_i915_private *i915) > > ddat->uncore = &i915->uncore; > > snprintf(ddat->name, sizeof(ddat->name), "i915"); > > ddat->gt_n = -1; > > + init_waitqueue_head(&ddat->wqh); > > > > for_each_gt(gt, i915, i) { > > ddat_gt = hwmon->ddat_gt + i; > > -- > > 2.38.0 > >