From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9F9F6EB64DC for ; Fri, 30 Jun 2023 17:08:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3CFE110E4CD; Fri, 30 Jun 2023 17:08:49 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 75BC810E4CD for ; Fri, 30 Jun 2023 17:08:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688144927; x=1719680927; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=uIbKBbjGYFvbXWfNnsqnHzF5+px9hY+e6/txKHH9ngw=; b=TwoMEnDEq13FUnsYwc0gaR/Nt5CVcd+DhzdeAMqIPSNDRTmsMFnGUDzh GyZnhkAOlnFFIhXxzXe4ia7AfSRTL58expjijQW1VQCZOqo4cCYci3dfO YTRzR5E2WVUdSym0xJlF8kcmOIf79g6mp6YHPJEPlUwd1zh1sttGrbots qTeyy0F3GIdjbH6zjNhJh1RLSH2S+Wpai1Cm5/Awg1irMMEXPRvhMtNNH 2lqRK+6jZhyz5OPREGd1P8t5pZ5a00tQBHsWE2wDf6ngmAi5LRHeMAuYH HphFUkgXGwv0+z8DkDbUSWT+enixnuGKUYrgKRYZ81S3q9q3622AKgg6T g==; X-IronPort-AV: E=McAfee;i="6600,9927,10757"; a="347225218" X-IronPort-AV: E=Sophos;i="6.01,171,1684825200"; d="scan'208";a="347225218" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2023 10:08:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10757"; a="807838566" X-IronPort-AV: E=Sophos;i="6.01,171,1684825200"; d="scan'208";a="807838566" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.212.173.53]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2023 10:08:38 -0700 Date: Fri, 30 Jun 2023 09:59:26 -0700 Message-ID: <87cz1c6g41.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Matthew Auld In-Reply-To: References: <20230626105037.43780-15-matthew.auld@intel.com> <87edlt5v2d.wl-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-xe@lists.freedesktop.org Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Fri, 30 Jun 2023 04:07:44 -0700, Matthew Auld wrote: > Hi Matt, > On 30/06/2023 07:21, Dixit, Ashutosh wrote: > > On Mon, 26 Jun 2023 03:50:38 -0700, Matthew Auld wrote: > >> > >> Main goal is to fix the races in xe_device_mem_access_get(). With that fixed we > >> can clean up some hacks and also start rolling it out to more places that need > >> it, including now asserting it around every mmio access. We also add lockdep > >> annotations for xe_device_mem_access_get() and fix the remaining lockdep > >> fallout. > >> > >> v11 -> v12: > >> - freq_rpe_show also needs the device to be awake > >> - Improvements to the lockdep annotation patch > > > > Just FYI, fwiw this is from my local branch, but even with this series I am > > seeing this on DG2. Probe is fine but as soon as my IGT (perf) runs the > > trace below spews out. The unit test runs fine on RPLP. If there's a > > temporary workaround for this I'd like to know. Thanks. > > > > -Ashutosh > > > > [ 486.110571] xe: loading out-of-tree module taints kernel. > > [ 486.131776] xe 0000:03:00.0: vgaarb: deactivate vga console > > [ 486.133136] GT topology dss mask (geometry): 00000000,0000ff00 > > [ 486.133139] GT topology dss mask (compute): 00000000,0000ff00 > > [ 486.133141] GT topology EU mask per DSS: 0000ffff > > [ 486.133636] xe 0000:03:00.0: [drm] VISIBLE VRAM: 0x0000004000000000, 0x0000000200000000 > > [ 486.133686] xe 0000:03:00.0: [drm] VRAM[0, 0]: 0x0000004000000000, 0x000000017c800000 > > [ 486.133688] xe 0000:03:00.0: [drm] Total VRAM: 0x0000004000000000, 0x0000000180000000 > > [ 486.133690] xe 0000:03:00.0: [drm] Available VRAM: 0x0000004000000000, 0x000000017c800000 > > [ 486.195204] xe 0000:03:00.0: [drm] Using GuC firmware (70.5) from i915/dg2_guc_70.bin > > [ 486.198192] xe 0000:03:00.0: [drm] HuC disabled > > [ 486.241959] xe 0000:03:00.0: [drm] ccs0 fused off > > [ 486.241964] xe 0000:03:00.0: [drm] ccs2 fused off > > [ 486.241965] xe 0000:03:00.0: [drm] ccs3 fused off > > [ 486.242509] xe REG[0x223a8-0x223af]: allow read access > > [ 486.242606] xe REG[0x1c03a8-0x1c03af]: allow read access > > [ 486.242708] xe REG[0x1d03a8-0x1d03af]: allow read access > > [ 486.242826] xe REG[0x1c83a8-0x1c83af]: allow read access > > [ 486.242945] xe REG[0x1d83a8-0x1d83af]: allow read access > > [ 486.243033] xe REG[0x1c3a8-0x1c3af]: allow read access > > [ 486.306291] [drm] Initialized xe 1.1.0 20201103 for 0000:03:00.0 on minor 0 > > [ 486.309344] insmod (3290) used greatest stack depth: 10936 bytes left > > [ 487.559809] xe 0000:03:00.0: [drm] GT0: suspended > > Device hits runtime suspend after probing the device. Looks normal so far... > > > [ 500.096224] [IGT] perf: executing > > [ 502.830435] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up > > [ 502.832901] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible > > [ 502.835939] pcieport 0000:02:01.0: Unable to change power state from D3cold to D0, device inaccessible > > [ 502.836769] pcieport 0000:02:04.0: Unable to change power state from D3cold to D0, device inaccessible > > [ 505.070434] xe 0000:03:00.0: not ready 1023ms after resume; giving up > > [ 505.071074] xe 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible > > And here we are tying to resume the device. This is still deep in PCI > stuff, and it's already looking bad since the device is unable to exit from > D3cold. Also we shouldn't even be in D3cold here (it implies PCI device is > powered off), but only D3hot. For reference on my DG2 it only goes from D0 > -> D3hot on runtime suspend and D3hot -> D0 on runtime resume. D3cold is > explicitly disabled for now with rpm as per xe->d3cold_allowed. But even > so, it's unclear why it can't restore power and get back to D0 (device is > maybe unresponsive/dead?). Although even if it did get as far as the driver > part of the resume it would still be all kinds of broken since VRAM has > been nuked. > > I would assume there is something broken/faulty with that system. Yes, now that I think about it, believe the IGT did work on a different system. > You could maybe try disabling rpm, and avoiding forced suspend/resume on > that system: > > --- a/drivers/gpu/drm/xe/xe_pm.c > +++ b/drivers/gpu/drm/xe/xe_pm.c > @@ -124,7 +124,6 @@ void xe_pm_runtime_init(struct xe_device *xe) > pm_runtime_use_autosuspend(dev); > pm_runtime_set_autosuspend_delay(dev, 1000); > pm_runtime_set_active(dev); > - pm_runtime_allow(dev); > pm_runtime_mark_last_busy(dev); > pm_runtime_put_autosuspend(dev); Thanks, yes this did unblock me on this system. Ashutosh