From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6E6BBC6FD19 for ; Mon, 13 Mar 2023 23:55:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ABD5810E679; Mon, 13 Mar 2023 23:55:40 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id B7F0910E668; Mon, 13 Mar 2023 23:55:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678751737; x=1710287737; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=OEMTM07hUsYlB39sBz8uTh+MxmyMOIv0On68EWL6Qds=; b=RFw3t09uPHer6XU21x9wqyT69wgsl7Bbm0NZdUQ6D9elrUk+Zy3MKhhh lJxXgsB6VNpn5sULN1+nC9KMfqE+LMfPu18oRpx3NQdR04nRkEHE18eBU xc6IhLPsZSdwh33iA2zSArDvo274p9Ooba/OY71sb5zp0rOVYfrNZTy/M lJ1NFsXEThKAmBrH2Fj0xfcS5kYtLdjPmDsT/2o+rCEF+0oVqF1HdPxJS Fh+wpbK15NOMJRI1UiA6Q9noJrFK3R+O9BpVGm1qTLHaovPA+7hGGZr6+ siAnzWaxwQCg2FRbzi9YG7/E/8xw6OOqmTlQLE+1ldxQKqOKF1n6Jd8fK A==; X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="423557310" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="423557310" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 16:55:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="789105986" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="789105986" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.209.11.59]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 16:55:37 -0700 Date: Mon, 13 Mar 2023 16:55:36 -0700 Message-ID: <87r0tskxzb.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: John Harrison In-Reply-To: References: <20230217234715.3609670-1-John.C.Harrison@Intel.com> <20230217234715.3609670-3-John.C.Harrison@Intel.com> <3baf596b-cd5e-87c0-bbd4-54a0e39f9e8c@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Allow for very slow GuC loading X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel-GFX@Lists.FreeDesktop.Org, DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Fri, 10 Mar 2023 17:01:42 -0800, John Harrison wrote: > > >> + for (count = 0; count < 20; count++) { > >> + ret = wait_for(guc_load_done(uncore, &status, &success), 1000); > > > > Isn't 20 secs a bit too long for an in-place wait? I get that if the GuC > > doesn't load (or fail to) within a few secs the HW is likely toast, but > > still that seems a bit too long to me. What's the worst case load time > > ever observed? I suggest reducing the wait to 3 secs as a compromise, if > > that's bigger than the worst case. > > I can drop it to 3 for normal builds and keep 20 for > CONFIG_DRM_I915_DEBUG_GEM builds. However, that won't actually be long > enough for all slow situations. We have seen times of at least 11s when the > GPU is running at minimum frequency. So, for CI runs we definitely want to > keep the 20s limit. For end users? Is it better to wait for up to 20s or to > boot in display only fallback mode? And note that this is a timeout only. A > functional system will still complete in tens of milliseconds. Just FYI, in this related patch: https://patchwork.freedesktop.org/series/115003/#rev2 I am holding a mutex across GuC FW load, so very unlikely, but worst case a thread can get blocked for the duration of the GuC reset/FW load. Ashutosh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BF0F4C6FD1D for ; Mon, 13 Mar 2023 23:55:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 22D4910E668; Mon, 13 Mar 2023 23:55:40 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id B7F0910E668; Mon, 13 Mar 2023 23:55:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678751737; x=1710287737; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=OEMTM07hUsYlB39sBz8uTh+MxmyMOIv0On68EWL6Qds=; b=RFw3t09uPHer6XU21x9wqyT69wgsl7Bbm0NZdUQ6D9elrUk+Zy3MKhhh lJxXgsB6VNpn5sULN1+nC9KMfqE+LMfPu18oRpx3NQdR04nRkEHE18eBU xc6IhLPsZSdwh33iA2zSArDvo274p9Ooba/OY71sb5zp0rOVYfrNZTy/M lJ1NFsXEThKAmBrH2Fj0xfcS5kYtLdjPmDsT/2o+rCEF+0oVqF1HdPxJS Fh+wpbK15NOMJRI1UiA6Q9noJrFK3R+O9BpVGm1qTLHaovPA+7hGGZr6+ siAnzWaxwQCg2FRbzi9YG7/E/8xw6OOqmTlQLE+1ldxQKqOKF1n6Jd8fK A==; X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="423557310" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="423557310" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 16:55:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="789105986" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="789105986" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.209.11.59]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 16:55:37 -0700 Date: Mon, 13 Mar 2023 16:55:36 -0700 Message-ID: <87r0tskxzb.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: John Harrison Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Allow for very slow GuC loading In-Reply-To: References: <20230217234715.3609670-1-John.C.Harrison@Intel.com> <20230217234715.3609670-3-John.C.Harrison@Intel.com> <3baf596b-cd5e-87c0-bbd4-54a0e39f9e8c@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel-GFX@Lists.FreeDesktop.Org, "Ceraolo Spurio, Daniele" , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, 10 Mar 2023 17:01:42 -0800, John Harrison wrote: > > >> + for (count = 0; count < 20; count++) { > >> + ret = wait_for(guc_load_done(uncore, &status, &success), 1000); > > > > Isn't 20 secs a bit too long for an in-place wait? I get that if the GuC > > doesn't load (or fail to) within a few secs the HW is likely toast, but > > still that seems a bit too long to me. What's the worst case load time > > ever observed? I suggest reducing the wait to 3 secs as a compromise, if > > that's bigger than the worst case. > > I can drop it to 3 for normal builds and keep 20 for > CONFIG_DRM_I915_DEBUG_GEM builds. However, that won't actually be long > enough for all slow situations. We have seen times of at least 11s when the > GPU is running at minimum frequency. So, for CI runs we definitely want to > keep the 20s limit. For end users? Is it better to wait for up to 20s or to > boot in display only fallback mode? And note that this is a timeout only. A > functional system will still complete in tens of milliseconds. Just FYI, in this related patch: https://patchwork.freedesktop.org/series/115003/#rev2 I am holding a mutex across GuC FW load, so very unlikely, but worst case a thread can get blocked for the duration of the GuC reset/FW load. Ashutosh