From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9DDCEC433FE for ; Fri, 21 Oct 2022 18:41:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D3EF10E324; Fri, 21 Oct 2022 18:41:34 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id B662510E324; Fri, 21 Oct 2022 18:41:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666377686; x=1697913686; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=adxpXhwDayaq/ZAkuTsm3qStreKzecjAAOjbaBl/1mU=; b=WkFVZUeiUJ2+Ktnd7t1FSw3Z4IMbmRSXlRctcdb/gMmK6mWWHEvbbVDI DwrtJiNG2xRDyAujnGhFVSs7YyMe0E3AOl0+9sWCBiuzS0FZlkBUw4eFe RKGlfeVnleNyj0Hhf3vZFjdWcJfqPwsZ1O2sYoGt91NaauQw7M8p96CP5 kZOfjOC07vIs2w8A52JdbBYpSOZSo4jjFuXNMft0Fd+FVpLEN+rH6muJD 7yI6kLbphyZRxk3tCG66fiRQXyswE7oY1EXB6s0vInzXVW/R4LwS+N5cT vextFGm40LOjQBadsycDITqvMbYfLfuaYfkZ3I404sruAVbMeROVeZlZ5 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="304691628" X-IronPort-AV: E=Sophos;i="5.95,202,1661842800"; d="scan'208";a="304691628" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 11:41:20 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="625451487" X-IronPort-AV: E=Sophos;i="5.95,202,1661842800"; d="scan'208";a="625451487" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.255.230.194]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 11:41:19 -0700 Date: Fri, 21 Oct 2022 11:40:36 -0700 Message-ID: <87y1t9ow9n.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: "Belgaumkar, Vinay" In-Reply-To: <2287a44e-80df-ef3c-1d4e-5ee80a62e381@intel.com> References: <20221020002944.4228-1-vinay.belgaumkar@intel.com> <874jvyqr9o.wl-ashutosh.dixit@intel.com> <871qr2qd7y.wl-ashutosh.dixit@intel.com> <2287a44e-80df-ef3c-1d4e-5ee80a62e381@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-gfx] [PATCH v3] drm/i915/slpc: Optmize waitboost for SLPC X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Fri, 21 Oct 2022 11:24:42 -0700, Belgaumkar, Vinay wrote: > > > On 10/20/2022 4:36 PM, Dixit, Ashutosh wrote: > > On Thu, 20 Oct 2022 13:16:00 -0700, Belgaumkar, Vinay wrote: > >> On 10/20/2022 11:33 AM, Dixit, Ashutosh wrote: > >>> On Wed, 19 Oct 2022 17:29:44 -0700, Vinay Belgaumkar wrote: > >>> Hi Vinay, > >>> > >>>> Waitboost (when SLPC is enabled) results in a H2G message. This can result > >>>> in thousands of messages during a stress test and fill up an already full > >>>> CTB. There is no need to request for RP0 if GuC is already requesting the > >>>> same. > >>> But how are we sure that the freq will remain at RP0 in the future (when > >>> the waiting request or any requests which are ahead execute)? > >>> > >>> In the current waitboost implementation, set_param is sent to GuC ahead of > >>> the waiting request to ensure that the freq would be max when this waiting > >>> request executed on the GPU and the freq is kept at max till this request > >>> retires (considering just one waiting request). How can we ensure this if > >>> we don't send the waitboost set_param to GuC? > >> There is no way to guarantee the frequency will remain at RP0 till the > >> request retires. As a theoretical example, lets say the request boosted > >> freq to RP0, but a user changed min freq using sysfs immediately after. > > That would be a bug. If waitboost is in progress and in the middle user > > changed min freq, I would expect the freq to revert to the new min only > > after the waitboost phase was over. > > The problem here is that GuC is unaware of this "boosting" > phenomenon. Setting the min_freq_softlimit as well to boost when we send a > boost request might help with this issue. > > > > > In any case, I am not referring to this case. Since FW controls the freq > > there is nothing preventing FW to change the freq unless we raise min to > > max which is what waitboost does. > Ok, so maybe the solution here is to check if min_softlimit is already at > boost freq, as it tracks the min freq changes. That should take care of > server parts automatically as well. Correct, yes that would be the right way to do it. Thanks. -- Ashutosh > >> Waitboost is done by a pending request to "hurry" the current requests. If > >> GT is already at boost frequency, that purpose is served. > > FW can bring the freq down later before the waiting request is scheduled. > >> Also, host algorithm already has this optimization as well. > > Host turbo is different from SLPC. Host turbo controls the freq algorithm > > so it knows freq will not come down till it itself brings the freq > > down. Unlike SLPC where FW is controling the freq. Therefore host turbo > > doesn't ever need to do a MMIO read but only needs to refer to its own > > state (rps->cur_freq etc.). > True. Host algorithm has a periodic timer where it updates frequency. Here, > it checks num_waiters and sets client_boost every time that is non-zero. > >>> I had assumed we'll do this optimization for server parts where min is > >>> already RP0 in which case we can completely disable waitboost. But this > >>> patch is something else. > > Hopefully the softlimit changes above will help with client and server. > > Thanks, > > Vinay. > > > Thanks. > > -- > > Ashutosh > > > >>>> v2: Add the tracing back, and check requested freq > >>>> in the worker thread (Tvrtko) > >>>> v3: Check requested freq in dec_waiters as well > >>>> > >>>> Signed-off-by: Vinay Belgaumkar > >>>> --- > >>>> drivers/gpu/drm/i915/gt/intel_rps.c | 3 +++ > >>>> drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 14 +++++++++++--- > >>>> 2 files changed, 14 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c > >>>> index fc23c562d9b2..18b75cf08d1b 100644 > >>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c > >>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c > >>>> @@ -1016,6 +1016,9 @@ void intel_rps_boost(struct i915_request *rq) > >>>> if (rps_uses_slpc(rps)) { > >>>> slpc = rps_to_slpc(rps); > >>>> > >>>> + GT_TRACE(rps_to_gt(rps), "boost fence:%llx:%llx\n", > >>>> + rq->fence.context, rq->fence.seqno); > >>>> + > >>>> /* Return if old value is non zero */ > >>>> if (!atomic_fetch_inc(&slpc->num_waiters)) > >>>> schedule_work(&slpc->boost_work); > >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c > >>>> index b7cdeec44bd3..9dbdbab1515a 100644 > >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c > >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c > >>>> @@ -227,14 +227,19 @@ static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq) > >>>> static void slpc_boost_work(struct work_struct *work) > >>>> { > >>>> struct intel_guc_slpc *slpc = container_of(work, typeof(*slpc), boost_work); > >>>> + struct intel_rps *rps = &slpc_to_gt(slpc)->rps; > >>>> int err; > >>>> > >>>> /* > >>>> * Raise min freq to boost. It's possible that > >>>> * this is greater than current max. But it will > >>>> * certainly be limited by RP0. An error setting > >>>> - * the min param is not fatal. > >>>> + * the min param is not fatal. No need to boost > >>>> + * if we are already requesting it. > >>>> */ > >>>> + if (intel_rps_get_requested_frequency(rps) == slpc->boost_freq) > >>>> + return; > >>>> + > >>>> mutex_lock(&slpc->lock); > >>>> if (atomic_read(&slpc->num_waiters)) { > >>>> err = slpc_force_min_freq(slpc, slpc->boost_freq); > >>>> @@ -728,6 +733,7 @@ int intel_guc_slpc_set_boost_freq(struct intel_guc_slpc *slpc, u32 val) > >>>> > >>>> void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc) > >>>> { > >>>> + struct intel_rps *rps = &slpc_to_gt(slpc)->rps; > >>>> /* > >>>> * Return min back to the softlimit. > >>>> * This is called during request retire, > >>>> @@ -735,8 +741,10 @@ void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc) > >>>> * set_param fails. > >>>> */ > >>>> mutex_lock(&slpc->lock); > >>>> - if (atomic_dec_and_test(&slpc->num_waiters)) > >>>> - slpc_force_min_freq(slpc, slpc->min_freq_softlimit); > >>>> + if (atomic_dec_and_test(&slpc->num_waiters)) { > >>>> + if (intel_rps_get_requested_frequency(rps) != slpc->min_freq_softlimit) > >>>> + slpc_force_min_freq(slpc, slpc->min_freq_softlimit); > >>>> + } > >>>> mutex_unlock(&slpc->lock); > >>>> } > >>>> > >>>> -- > >>>> 2.35.1 > >>>> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 43687C38A2D for ; Fri, 21 Oct 2022 18:41:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2903410E5E2; Fri, 21 Oct 2022 18:41:35 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id B662510E324; Fri, 21 Oct 2022 18:41:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666377686; x=1697913686; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=adxpXhwDayaq/ZAkuTsm3qStreKzecjAAOjbaBl/1mU=; b=WkFVZUeiUJ2+Ktnd7t1FSw3Z4IMbmRSXlRctcdb/gMmK6mWWHEvbbVDI DwrtJiNG2xRDyAujnGhFVSs7YyMe0E3AOl0+9sWCBiuzS0FZlkBUw4eFe RKGlfeVnleNyj0Hhf3vZFjdWcJfqPwsZ1O2sYoGt91NaauQw7M8p96CP5 kZOfjOC07vIs2w8A52JdbBYpSOZSo4jjFuXNMft0Fd+FVpLEN+rH6muJD 7yI6kLbphyZRxk3tCG66fiRQXyswE7oY1EXB6s0vInzXVW/R4LwS+N5cT vextFGm40LOjQBadsycDITqvMbYfLfuaYfkZ3I404sruAVbMeROVeZlZ5 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="304691628" X-IronPort-AV: E=Sophos;i="5.95,202,1661842800"; d="scan'208";a="304691628" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 11:41:20 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="625451487" X-IronPort-AV: E=Sophos;i="5.95,202,1661842800"; d="scan'208";a="625451487" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.255.230.194]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 11:41:19 -0700 Date: Fri, 21 Oct 2022 11:40:36 -0700 Message-ID: <87y1t9ow9n.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: "Belgaumkar, Vinay" Subject: Re: [Intel-gfx] [PATCH v3] drm/i915/slpc: Optmize waitboost for SLPC In-Reply-To: <2287a44e-80df-ef3c-1d4e-5ee80a62e381@intel.com> References: <20221020002944.4228-1-vinay.belgaumkar@intel.com> <874jvyqr9o.wl-ashutosh.dixit@intel.com> <871qr2qd7y.wl-ashutosh.dixit@intel.com> <2287a44e-80df-ef3c-1d4e-5ee80a62e381@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tvrtko Ursulin , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, 21 Oct 2022 11:24:42 -0700, Belgaumkar, Vinay wrote: > > > On 10/20/2022 4:36 PM, Dixit, Ashutosh wrote: > > On Thu, 20 Oct 2022 13:16:00 -0700, Belgaumkar, Vinay wrote: > >> On 10/20/2022 11:33 AM, Dixit, Ashutosh wrote: > >>> On Wed, 19 Oct 2022 17:29:44 -0700, Vinay Belgaumkar wrote: > >>> Hi Vinay, > >>> > >>>> Waitboost (when SLPC is enabled) results in a H2G message. This can result > >>>> in thousands of messages during a stress test and fill up an already full > >>>> CTB. There is no need to request for RP0 if GuC is already requesting the > >>>> same. > >>> But how are we sure that the freq will remain at RP0 in the future (when > >>> the waiting request or any requests which are ahead execute)? > >>> > >>> In the current waitboost implementation, set_param is sent to GuC ahead of > >>> the waiting request to ensure that the freq would be max when this waiting > >>> request executed on the GPU and the freq is kept at max till this request > >>> retires (considering just one waiting request). How can we ensure this if > >>> we don't send the waitboost set_param to GuC? > >> There is no way to guarantee the frequency will remain at RP0 till the > >> request retires. As a theoretical example, lets say the request boosted > >> freq to RP0, but a user changed min freq using sysfs immediately after. > > That would be a bug. If waitboost is in progress and in the middle user > > changed min freq, I would expect the freq to revert to the new min only > > after the waitboost phase was over. > > The problem here is that GuC is unaware of this "boosting" > phenomenon. Setting the min_freq_softlimit as well to boost when we send a > boost request might help with this issue. > > > > > In any case, I am not referring to this case. Since FW controls the freq > > there is nothing preventing FW to change the freq unless we raise min to > > max which is what waitboost does. > Ok, so maybe the solution here is to check if min_softlimit is already at > boost freq, as it tracks the min freq changes. That should take care of > server parts automatically as well. Correct, yes that would be the right way to do it. Thanks. -- Ashutosh > >> Waitboost is done by a pending request to "hurry" the current requests. If > >> GT is already at boost frequency, that purpose is served. > > FW can bring the freq down later before the waiting request is scheduled. > >> Also, host algorithm already has this optimization as well. > > Host turbo is different from SLPC. Host turbo controls the freq algorithm > > so it knows freq will not come down till it itself brings the freq > > down. Unlike SLPC where FW is controling the freq. Therefore host turbo > > doesn't ever need to do a MMIO read but only needs to refer to its own > > state (rps->cur_freq etc.). > True. Host algorithm has a periodic timer where it updates frequency. Here, > it checks num_waiters and sets client_boost every time that is non-zero. > >>> I had assumed we'll do this optimization for server parts where min is > >>> already RP0 in which case we can completely disable waitboost. But this > >>> patch is something else. > > Hopefully the softlimit changes above will help with client and server. > > Thanks, > > Vinay. > > > Thanks. > > -- > > Ashutosh > > > >>>> v2: Add the tracing back, and check requested freq > >>>> in the worker thread (Tvrtko) > >>>> v3: Check requested freq in dec_waiters as well > >>>> > >>>> Signed-off-by: Vinay Belgaumkar > >>>> --- > >>>> drivers/gpu/drm/i915/gt/intel_rps.c | 3 +++ > >>>> drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 14 +++++++++++--- > >>>> 2 files changed, 14 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c > >>>> index fc23c562d9b2..18b75cf08d1b 100644 > >>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c > >>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c > >>>> @@ -1016,6 +1016,9 @@ void intel_rps_boost(struct i915_request *rq) > >>>> if (rps_uses_slpc(rps)) { > >>>> slpc = rps_to_slpc(rps); > >>>> > >>>> + GT_TRACE(rps_to_gt(rps), "boost fence:%llx:%llx\n", > >>>> + rq->fence.context, rq->fence.seqno); > >>>> + > >>>> /* Return if old value is non zero */ > >>>> if (!atomic_fetch_inc(&slpc->num_waiters)) > >>>> schedule_work(&slpc->boost_work); > >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c > >>>> index b7cdeec44bd3..9dbdbab1515a 100644 > >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c > >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c > >>>> @@ -227,14 +227,19 @@ static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq) > >>>> static void slpc_boost_work(struct work_struct *work) > >>>> { > >>>> struct intel_guc_slpc *slpc = container_of(work, typeof(*slpc), boost_work); > >>>> + struct intel_rps *rps = &slpc_to_gt(slpc)->rps; > >>>> int err; > >>>> > >>>> /* > >>>> * Raise min freq to boost. It's possible that > >>>> * this is greater than current max. But it will > >>>> * certainly be limited by RP0. An error setting > >>>> - * the min param is not fatal. > >>>> + * the min param is not fatal. No need to boost > >>>> + * if we are already requesting it. > >>>> */ > >>>> + if (intel_rps_get_requested_frequency(rps) == slpc->boost_freq) > >>>> + return; > >>>> + > >>>> mutex_lock(&slpc->lock); > >>>> if (atomic_read(&slpc->num_waiters)) { > >>>> err = slpc_force_min_freq(slpc, slpc->boost_freq); > >>>> @@ -728,6 +733,7 @@ int intel_guc_slpc_set_boost_freq(struct intel_guc_slpc *slpc, u32 val) > >>>> > >>>> void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc) > >>>> { > >>>> + struct intel_rps *rps = &slpc_to_gt(slpc)->rps; > >>>> /* > >>>> * Return min back to the softlimit. > >>>> * This is called during request retire, > >>>> @@ -735,8 +741,10 @@ void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc) > >>>> * set_param fails. > >>>> */ > >>>> mutex_lock(&slpc->lock); > >>>> - if (atomic_dec_and_test(&slpc->num_waiters)) > >>>> - slpc_force_min_freq(slpc, slpc->min_freq_softlimit); > >>>> + if (atomic_dec_and_test(&slpc->num_waiters)) { > >>>> + if (intel_rps_get_requested_frequency(rps) != slpc->min_freq_softlimit) > >>>> + slpc_force_min_freq(slpc, slpc->min_freq_softlimit); > >>>> + } > >>>> mutex_unlock(&slpc->lock); > >>>> } > >>>> > >>>> -- > >>>> 2.35.1 > >>>>