From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 394D2C433ED for ; Fri, 21 May 2021 07:13:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18D69613BF for ; Fri, 21 May 2021 07:13:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232351AbhEUHO7 convert rfc822-to-8bit (ORCPT ); Fri, 21 May 2021 03:14:59 -0400 Received: from mga14.intel.com ([192.55.52.115]:53758 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235576AbhEUHO5 (ORCPT ); Fri, 21 May 2021 03:14:57 -0400 IronPort-SDR: YszxObn5rdfpwNKmRHpdpLuY8QLD3FN1vltp9qLqWwTOJjjFvBuE1OOHzGFyBWKOTbD4n51ppa BM2ayookTX2w== X-IronPort-AV: E=McAfee;i="6200,9189,9990"; a="201132816" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="201132816" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 00:13:31 -0700 IronPort-SDR: BUQBaLf5kQ7GMMasvdC3eDzHHt5K0Wwm8t1hOHzN5dhq7TyzYeLKDrbBS9QX3H2yR4N7+PFRGG 6W4GN3BhBinA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="475584926" Received: from irsmsx601.ger.corp.intel.com ([163.33.146.7]) by fmsmga002.fm.intel.com with ESMTP; 21 May 2021 00:13:28 -0700 Received: from irsmsx603.ger.corp.intel.com (163.33.146.9) by irsmsx601.ger.corp.intel.com (163.33.146.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.4; Fri, 21 May 2021 08:13:27 +0100 Received: from irsmsx603.ger.corp.intel.com ([163.33.146.9]) by irsmsx603.ger.corp.intel.com ([163.33.146.9]) with mapi id 15.01.2242.008; Fri, 21 May 2021 08:13:27 +0100 From: "Saarinen, Jani" To: Chris Chiu , "Rafael J. Wysocki" CC: "Brown, Len" , Karol Herbst , Linux PM , Linux PCI , "Westerberg, Mika" , "Rafael J. Wysocki" , dri-devel , "Bjorn Helgaas" , "intel-gfx@lists.freedesktop.org" Subject: RE: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle Thread-Topic: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle Thread-Index: AQHXTfY04tpq62Sg5UipcvghO8S7LarthXrw Date: Fri, 21 May 2021 07:13:26 +0000 Message-ID: <1953f07d15db4fda8a40e5ca752bef96@intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-version: 11.5.1.3 dlp-reaction: no-action dlp-product: dlpe-windows x-originating-ip: [10.184.70.1] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Hi, > -----Original Message----- > From: Intel-gfx On Behalf Of Chris Chiu > Sent: perjantai 21. toukokuuta 2021 7.02 > To: Rafael J. Wysocki > Cc: Brown, Len ; Karol Herbst ; Linux > PM ; Linux PCI ; > Westerberg, Mika ; Rafael J. Wysocki > ; dri-devel ; Bjorn Helgaas > ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle > > On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki wrote: > > > > On Tue, May 4, 2021 at 10:08 AM Chris Chiu wrote: > > > > > > Hi, > > > We have some Intel laptops (11th generation CPU) with NVIDIA GPU > > > suffering the same GPU falling off the bus problem while exiting > > > s2idle with external display connected. These laptops connect the > > > external display via the HDMI/DisplayPort on a USB Type-C interfaced > > > dock. If we enter and exit s2idle with the dock connected, the > > > NVIDIA GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port > > > can come back to D0 w/o problem. If we enter the s2idle, disconnect > > > the dock, then exit the s2idle, both external display and the panel > > > will remain with no output. The dmesg as follows shows the "nvidia > 0000:01:00.0: > > > can't change power state from D3cold to D0 (config space > > > inaccessible)" due to the following ACPI error [ 154.446781] [ > > > 154.446783] [ 154.446783] Initialized Local Variables for Method > > > [IPCS]: > > > [ 154.446784] Local0: 000000009863e365 Integer > > > 00000000000009C5 [ 154.446790] [ 154.446791] Initialized Arguments > > > for Method [IPCS]: (7 arguments defined for method invocation) [ > > > 154.446792] Arg0: 0000000025568fbd Integer 00000000000000AC [ > > > 154.446795] Arg1: 000000009ef30e76 Integer 0000000000000000 [ > > > 154.446798] Arg2: 00000000fdf820f0 Integer 0000000000000010 [ > > > 154.446801] Arg3: 000000009fc2a088 Integer 0000000000000001 [ > > > 154.446804] Arg4: 000000003a3418f7 Integer 0000000000000001 [ > > > 154.446807] Arg5: 0000000020c4b87c Integer 0000000000000000 [ > > > 154.446810] Arg6: 000000008b965a8a Integer 0000000000000000 [ > > > 154.446813] [ 154.446815] ACPI Error: Aborting method \IPCS due to > > > previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446824] ACPI > > > Error: Aborting method \MCUI due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446829] ACPI > > > Error: Aborting method \SPCX due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446835] ACPI > > > Error: Aborting method \_SB.PC00.PGSC due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446841] ACPI > > > Error: Aborting method \_SB.PC00.PGON due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446846] ACPI > > > Error: Aborting method \_SB.PC00.PEG1.NPON due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446852] ACPI > > > Error: Aborting method \_SB.PC00.PEG1.PG01._ON due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446860] acpi > > > device:02: Failed to change power state to D0 [ 154.690760] video > > > LNXVIDEO:00: Cannot transition to power state D0 for parent in > > > (unknown) > > > > If I were to guess, I would say that AML tries to access memory that > > is not accessible while suspended, probably PCI config space. > > > > > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON > > > which we expect it to prepare everything before bringing back the > > > NVIDIA GPU but it's stuck in the infinite loop as described below. > > > Please refer to > > > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 > > > for the full DSDT.dsl. > > > > The DSDT alone may not be sufficient. > > > > Can you please create a bug entry at bugzilla.kernel.org for this > > issue and attach the full output of acpidump from one of the affected > > machines to it? And please let me know the number of the bug. > > > > Also please attach the output of dmesg including a suspend-resume > > cycle including dock disconnection while suspended and the ACPI > > messages quoted below. > > > > > While (One) > > > { > > > If ((!IBSY || (IERR == One))) > > > { > > > Break > > > } > > > > > > If ((Local0 > TMOV)) > > > { > > > RPKG [Zero] = 0x03 > > > Return (RPKG) /* \IPCS.RPKG */ > > > } > > > > > > Sleep (One) > > > Local0++ > > > } > > > > > > And the upstream PCIe port of NVIDIA seems to become inaccessible > > > due to the messages as follows. > > > [ 292.746508] pcieport 0000:00:01.0: waiting 100 ms for downstream > > > link, after activation [ 292.882296] pci 0000:01:00.0: waiting > > > additional 100 ms to become accessible [ 316.876997] pci > > > 0000:01:00.0: can't change power state from D3cold to D0 (config > > > space inaccessible) > > > > > > Since the IPCS is the Intel Reference Code and we don't really know > > > why the never-end loop happens just because we unplug the dock while > > > the system still stays in s2idle. Can anyone from Intel suggest what > > > happens here? > > > > This list is not the right channel for inquiries related to Intel > > support, we can only help you as Linux kernel developers in this > > venue. > > > > > And one thing also worth mentioning, if we unplug the display cable > > > from the dock before entering the s2idle, NVIDIA GPU can come back > > > w/o problem even if we disconnect the dock before exiting s2idle. > > > Here's the lspci information > > > https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4 > > > and the dmesg log with ACPI trace_state enabled and dynamic debug on > > > for drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle > > > enter/exit with IPCS timeout. > > > > > > Any suggestion would be appreciated. Thanks. > > > > First, please use proper Intel support channels for BIOS-related inquiries. > > > > Second, please open a bug as suggested above and let's use it for > > further communication regarding this issue as far as Linux is > > concerned. > > > > Thanks! > > Thanks for the suggestion. I opened > https://bugzilla.kernel.org/show_bug.cgi?id=212951 and have a new finding in > https://bugzilla.kernel.org/show_bug.cgi?id=212951#c13. It seems that maybe we > could do something in the i915 driver during resume to handle the hpd (because we > unplug the dock/dongle when > suspended) at the very beginning. Since it involves HPD, PMC and the BIOS, I don't > know which way I should go to fix this since Windows won't hit this problem. How about https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs to get also our devs involved better. > > Please let me know if there's any information missing in the bugzilla.kernel ticket. > Any suggestions would be appreciated. Thanks > > Chris > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx