From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C020C6FD1D for ; Thu, 30 Mar 2023 06:10:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D0AFC10E075; Thu, 30 Mar 2023 06:10:15 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id EDB2410E075; Thu, 30 Mar 2023 06:10:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680156613; x=1711692613; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=my/OfmOjsqV8Y4jS4/BMjv2dpxmDgyXEhntD3T1bHEI=; b=oEtVOGjg8/gvkhoLz6H8TJxEXdq0XbGd9YMfgtAcZZkM1X2uqj4b+xrn xcOmXcj3SEYKGktrrXpd4IYuhkX2VC8ag3tiI3fP5YMUcLvXaAxbaL4XX UNVNJBlY91T+c/RI9Vt7zfpBAS0BY+/6D/+a4Bp2+87qhyOuGRQroDAJ5 DyFkH+xJO4+iCfCSYaHmM5kWkrTYf63tH7d6YAWzkgCPX8FMiUrQMsZRk +zObF8w63GSLzDhpRd++uebUqw6bKuCHuYC7VgGFRLytPQ2DmVGmVw+NB ZzgjhplTgun9HSxdV4bI6Lz2dIqe+M9bWSFr4Qt2bDnLZ54EW+tr6SrpJ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10664"; a="321464147" X-IronPort-AV: E=Sophos;i="5.98,303,1673942400"; d="scan'208";a="321464147" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2023 23:10:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10664"; a="687108304" X-IronPort-AV: E=Sophos;i="5.98,303,1673942400"; d="scan'208";a="687108304" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.209.66.205]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2023 23:10:13 -0700 Date: Wed, 29 Mar 2023 22:50:09 -0700 Message-ID: <87cz4qlre6.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: intel-gfx@lists.freedesktop.org In-Reply-To: <20230328233543.1091127-1-ashutosh.dixit@intel.com> References: <20230328233543.1091127-1-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-gfx] [PATCH] drm/i915/hwmon: Use 0 to designate disabled PL1 power limit X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dri-devel@lists.freedesktop.org, Rodrigo Vivi Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, 28 Mar 2023 16:35:43 -0700, Ashutosh Dixit wrote: > > On ATSM the PL1 limit is disabled at power up. The previous uapi assumed > that the PL1 limit is always enabled and therefore did not have a notion of > a disabled PL1 limit. This results in erroneous PL1 limit values when the > PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit > was previously shown as 0 which means a low PL1 limit whereas the limit > being disabled actually implies a high effective PL1 limit value. > > To get round this problem, the PL1 limit uapi is expanded to include a > special value 0 to designate a disabled PL1 limit. This patch is another attempt to show when the PL1 power limit is disabled and to disable it when it needs to. Previous abandoned attempts to do this are [1] and [2]. The preferred way to do this was [2] but that was NAK'd by hwmon folks (see [2]). That is why here we fall back on the approach in [1]. This patch is identical to [1] except that the value used to disable the PL1 limit has been changed to 0 (from -1 in [1]) as was suggested in [2] (both -1 and 0 seem ok for the purpose). > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8060 The link between this patch and these pretty serious bugs might not be immediately clear so here's an explanation: * Because on ATSM the PL1 power limit is disabled on power up and there were no means to enable it, in 6fd3d8bf89fc we implemented the means to enable the limit when the PL1 hwmon entry (power1_max) was written to. * Now there is an IGT igt@i915_hwmon@hwmon_write which (a) reads orig value from all hwmon sysfs (b) does a bunch of random writes and finally (c) restores the orig value read. On ATSM since the orig value was 0, when the IGT restores the 0 value, the PL1 limit is now enabled with a value of 0. * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to 100 MHz. This causes GuC FW load and several IGT's to start timing out and gives rise the above (and even more) bugs about GuC FW load timing out. * After this patch, writing 0 would disable the PL1 limit instead of enabling it, avoiding the freq drop issue above, and resolving this Intel CI issue. Thanks. -- Ashutosh [1] https://patchwork.freedesktop.org/patch/522612/?series=113972&rev=1 [2] https://patchwork.freedesktop.org/patch/522652/?series=113984&rev=1