From: "Saarinen, Jani" <jani.saarinen@intel.com>
To: Chris Chiu <chris.chiu@canonical.com>,
"Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>,
Karol Herbst <kherbst@redhat.com>,
Linux PM <linux-pm@vger.kernel.org>,
Linux PCI <linux-pci@vger.kernel.org>,
"Westerberg, Mika" <mika.westerberg@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
dri-devel <dri-devel@lists.freedesktop.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle
Date: Fri, 21 May 2021 07:13:26 +0000 [thread overview]
Message-ID: <1953f07d15db4fda8a40e5ca752bef96@intel.com> (raw)
In-Reply-To: <CABTNMG12A5qJ5ygtFTa7Sk-5W=fmMxt0L90=04H5qRDD4vWGRQ@mail.gmail.com>
Hi,
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Chris Chiu
> Sent: perjantai 21. toukokuuta 2021 7.02
> To: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Brown, Len <len.brown@intel.com>; Karol Herbst <kherbst@redhat.com>; Linux
> PM <linux-pm@vger.kernel.org>; Linux PCI <linux-pci@vger.kernel.org>;
> Westerberg, Mika <mika.westerberg@intel.com>; Rafael J. Wysocki
> <rjw@rjwysocki.net>; dri-devel <dri-devel@lists.freedesktop.org>; Bjorn Helgaas
> <bhelgaas@google.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle
>
> On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, May 4, 2021 at 10:08 AM Chris Chiu <chris.chiu@canonical.com> wrote:
> > >
> > > Hi,
> > > We have some Intel laptops (11th generation CPU) with NVIDIA GPU
> > > suffering the same GPU falling off the bus problem while exiting
> > > s2idle with external display connected. These laptops connect the
> > > external display via the HDMI/DisplayPort on a USB Type-C interfaced
> > > dock. If we enter and exit s2idle with the dock connected, the
> > > NVIDIA GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port
> > > can come back to D0 w/o problem. If we enter the s2idle, disconnect
> > > the dock, then exit the s2idle, both external display and the panel
> > > will remain with no output. The dmesg as follows shows the "nvidia
> 0000:01:00.0:
> > > can't change power state from D3cold to D0 (config space
> > > inaccessible)" due to the following ACPI error [ 154.446781] [
> > > 154.446783] [ 154.446783] Initialized Local Variables for Method
> > > [IPCS]:
> > > [ 154.446784] Local0: 000000009863e365 <Obj> Integer
> > > 00000000000009C5 [ 154.446790] [ 154.446791] Initialized Arguments
> > > for Method [IPCS]: (7 arguments defined for method invocation) [
> > > 154.446792] Arg0: 0000000025568fbd <Obj> Integer 00000000000000AC [
> > > 154.446795] Arg1: 000000009ef30e76 <Obj> Integer 0000000000000000 [
> > > 154.446798] Arg2: 00000000fdf820f0 <Obj> Integer 0000000000000010 [
> > > 154.446801] Arg3: 000000009fc2a088 <Obj> Integer 0000000000000001 [
> > > 154.446804] Arg4: 000000003a3418f7 <Obj> Integer 0000000000000001 [
> > > 154.446807] Arg5: 0000000020c4b87c <Obj> Integer 0000000000000000 [
> > > 154.446810] Arg6: 000000008b965a8a <Obj> Integer 0000000000000000 [
> > > 154.446813] [ 154.446815] ACPI Error: Aborting method \IPCS due to
> > > previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446824] ACPI
> > > Error: Aborting method \MCUI due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446829] ACPI
> > > Error: Aborting method \SPCX due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446835] ACPI
> > > Error: Aborting method \_SB.PC00.PGSC due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446841] ACPI
> > > Error: Aborting method \_SB.PC00.PGON due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446846] ACPI
> > > Error: Aborting method \_SB.PC00.PEG1.NPON due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446852] ACPI
> > > Error: Aborting method \_SB.PC00.PEG1.PG01._ON due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446860] acpi
> > > device:02: Failed to change power state to D0 [ 154.690760] video
> > > LNXVIDEO:00: Cannot transition to power state D0 for parent in
> > > (unknown)
> >
> > If I were to guess, I would say that AML tries to access memory that
> > is not accessible while suspended, probably PCI config space.
> >
> > > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
> > > which we expect it to prepare everything before bringing back the
> > > NVIDIA GPU but it's stuck in the infinite loop as described below.
> > > Please refer to
> > > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44
> > > for the full DSDT.dsl.
> >
> > The DSDT alone may not be sufficient.
> >
> > Can you please create a bug entry at bugzilla.kernel.org for this
> > issue and attach the full output of acpidump from one of the affected
> > machines to it? And please let me know the number of the bug.
> >
> > Also please attach the output of dmesg including a suspend-resume
> > cycle including dock disconnection while suspended and the ACPI
> > messages quoted below.
> >
> > > While (One)
> > > {
> > > If ((!IBSY || (IERR == One)))
> > > {
> > > Break
> > > }
> > >
> > > If ((Local0 > TMOV))
> > > {
> > > RPKG [Zero] = 0x03
> > > Return (RPKG) /* \IPCS.RPKG */
> > > }
> > >
> > > Sleep (One)
> > > Local0++
> > > }
> > >
> > > And the upstream PCIe port of NVIDIA seems to become inaccessible
> > > due to the messages as follows.
> > > [ 292.746508] pcieport 0000:00:01.0: waiting 100 ms for downstream
> > > link, after activation [ 292.882296] pci 0000:01:00.0: waiting
> > > additional 100 ms to become accessible [ 316.876997] pci
> > > 0000:01:00.0: can't change power state from D3cold to D0 (config
> > > space inaccessible)
> > >
> > > Since the IPCS is the Intel Reference Code and we don't really know
> > > why the never-end loop happens just because we unplug the dock while
> > > the system still stays in s2idle. Can anyone from Intel suggest what
> > > happens here?
> >
> > This list is not the right channel for inquiries related to Intel
> > support, we can only help you as Linux kernel developers in this
> > venue.
> >
> > > And one thing also worth mentioning, if we unplug the display cable
> > > from the dock before entering the s2idle, NVIDIA GPU can come back
> > > w/o problem even if we disconnect the dock before exiting s2idle.
> > > Here's the lspci information
> > > https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4
> > > and the dmesg log with ACPI trace_state enabled and dynamic debug on
> > > for drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle
> > > enter/exit with IPCS timeout.
> > >
> > > Any suggestion would be appreciated. Thanks.
> >
> > First, please use proper Intel support channels for BIOS-related inquiries.
> >
> > Second, please open a bug as suggested above and let's use it for
> > further communication regarding this issue as far as Linux is
> > concerned.
> >
> > Thanks!
>
> Thanks for the suggestion. I opened
> https://bugzilla.kernel.org/show_bug.cgi?id=212951 and have a new finding in
> https://bugzilla.kernel.org/show_bug.cgi?id=212951#c13. It seems that maybe we
> could do something in the i915 driver during resume to handle the hpd (because we
> unplug the dock/dongle when
> suspended) at the very beginning. Since it involves HPD, PMC and the BIOS, I don't
> know which way I should go to fix this since Windows won't hit this problem.
How about https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs to get also our devs involved better.
>
> Please let me know if there's any information missing in the bugzilla.kernel ticket.
> Any suggestions would be appreciated. Thanks
>
> Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: "Saarinen, Jani" <jani.saarinen@intel.com>
To: Chris Chiu <chris.chiu@canonical.com>,
"Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>,
Karol Herbst <kherbst@redhat.com>,
Linux PM <linux-pm@vger.kernel.org>,
Linux PCI <linux-pci@vger.kernel.org>,
"Westerberg, Mika" <mika.westerberg@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
dri-devel <dri-devel@lists.freedesktop.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Subject: RE: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle
Date: Fri, 21 May 2021 07:13:26 +0000 [thread overview]
Message-ID: <1953f07d15db4fda8a40e5ca752bef96@intel.com> (raw)
In-Reply-To: <CABTNMG12A5qJ5ygtFTa7Sk-5W=fmMxt0L90=04H5qRDD4vWGRQ@mail.gmail.com>
Hi,
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Chris Chiu
> Sent: perjantai 21. toukokuuta 2021 7.02
> To: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Brown, Len <len.brown@intel.com>; Karol Herbst <kherbst@redhat.com>; Linux
> PM <linux-pm@vger.kernel.org>; Linux PCI <linux-pci@vger.kernel.org>;
> Westerberg, Mika <mika.westerberg@intel.com>; Rafael J. Wysocki
> <rjw@rjwysocki.net>; dri-devel <dri-devel@lists.freedesktop.org>; Bjorn Helgaas
> <bhelgaas@google.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle
>
> On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, May 4, 2021 at 10:08 AM Chris Chiu <chris.chiu@canonical.com> wrote:
> > >
> > > Hi,
> > > We have some Intel laptops (11th generation CPU) with NVIDIA GPU
> > > suffering the same GPU falling off the bus problem while exiting
> > > s2idle with external display connected. These laptops connect the
> > > external display via the HDMI/DisplayPort on a USB Type-C interfaced
> > > dock. If we enter and exit s2idle with the dock connected, the
> > > NVIDIA GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port
> > > can come back to D0 w/o problem. If we enter the s2idle, disconnect
> > > the dock, then exit the s2idle, both external display and the panel
> > > will remain with no output. The dmesg as follows shows the "nvidia
> 0000:01:00.0:
> > > can't change power state from D3cold to D0 (config space
> > > inaccessible)" due to the following ACPI error [ 154.446781] [
> > > 154.446783] [ 154.446783] Initialized Local Variables for Method
> > > [IPCS]:
> > > [ 154.446784] Local0: 000000009863e365 <Obj> Integer
> > > 00000000000009C5 [ 154.446790] [ 154.446791] Initialized Arguments
> > > for Method [IPCS]: (7 arguments defined for method invocation) [
> > > 154.446792] Arg0: 0000000025568fbd <Obj> Integer 00000000000000AC [
> > > 154.446795] Arg1: 000000009ef30e76 <Obj> Integer 0000000000000000 [
> > > 154.446798] Arg2: 00000000fdf820f0 <Obj> Integer 0000000000000010 [
> > > 154.446801] Arg3: 000000009fc2a088 <Obj> Integer 0000000000000001 [
> > > 154.446804] Arg4: 000000003a3418f7 <Obj> Integer 0000000000000001 [
> > > 154.446807] Arg5: 0000000020c4b87c <Obj> Integer 0000000000000000 [
> > > 154.446810] Arg6: 000000008b965a8a <Obj> Integer 0000000000000000 [
> > > 154.446813] [ 154.446815] ACPI Error: Aborting method \IPCS due to
> > > previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446824] ACPI
> > > Error: Aborting method \MCUI due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446829] ACPI
> > > Error: Aborting method \SPCX due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446835] ACPI
> > > Error: Aborting method \_SB.PC00.PGSC due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446841] ACPI
> > > Error: Aborting method \_SB.PC00.PGON due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446846] ACPI
> > > Error: Aborting method \_SB.PC00.PEG1.NPON due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446852] ACPI
> > > Error: Aborting method \_SB.PC00.PEG1.PG01._ON due to previous error
> > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446860] acpi
> > > device:02: Failed to change power state to D0 [ 154.690760] video
> > > LNXVIDEO:00: Cannot transition to power state D0 for parent in
> > > (unknown)
> >
> > If I were to guess, I would say that AML tries to access memory that
> > is not accessible while suspended, probably PCI config space.
> >
> > > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
> > > which we expect it to prepare everything before bringing back the
> > > NVIDIA GPU but it's stuck in the infinite loop as described below.
> > > Please refer to
> > > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44
> > > for the full DSDT.dsl.
> >
> > The DSDT alone may not be sufficient.
> >
> > Can you please create a bug entry at bugzilla.kernel.org for this
> > issue and attach the full output of acpidump from one of the affected
> > machines to it? And please let me know the number of the bug.
> >
> > Also please attach the output of dmesg including a suspend-resume
> > cycle including dock disconnection while suspended and the ACPI
> > messages quoted below.
> >
> > > While (One)
> > > {
> > > If ((!IBSY || (IERR == One)))
> > > {
> > > Break
> > > }
> > >
> > > If ((Local0 > TMOV))
> > > {
> > > RPKG [Zero] = 0x03
> > > Return (RPKG) /* \IPCS.RPKG */
> > > }
> > >
> > > Sleep (One)
> > > Local0++
> > > }
> > >
> > > And the upstream PCIe port of NVIDIA seems to become inaccessible
> > > due to the messages as follows.
> > > [ 292.746508] pcieport 0000:00:01.0: waiting 100 ms for downstream
> > > link, after activation [ 292.882296] pci 0000:01:00.0: waiting
> > > additional 100 ms to become accessible [ 316.876997] pci
> > > 0000:01:00.0: can't change power state from D3cold to D0 (config
> > > space inaccessible)
> > >
> > > Since the IPCS is the Intel Reference Code and we don't really know
> > > why the never-end loop happens just because we unplug the dock while
> > > the system still stays in s2idle. Can anyone from Intel suggest what
> > > happens here?
> >
> > This list is not the right channel for inquiries related to Intel
> > support, we can only help you as Linux kernel developers in this
> > venue.
> >
> > > And one thing also worth mentioning, if we unplug the display cable
> > > from the dock before entering the s2idle, NVIDIA GPU can come back
> > > w/o problem even if we disconnect the dock before exiting s2idle.
> > > Here's the lspci information
> > > https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4
> > > and the dmesg log with ACPI trace_state enabled and dynamic debug on
> > > for drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle
> > > enter/exit with IPCS timeout.
> > >
> > > Any suggestion would be appreciated. Thanks.
> >
> > First, please use proper Intel support channels for BIOS-related inquiries.
> >
> > Second, please open a bug as suggested above and let's use it for
> > further communication regarding this issue as far as Linux is
> > concerned.
> >
> > Thanks!
>
> Thanks for the suggestion. I opened
> https://bugzilla.kernel.org/show_bug.cgi?id=212951 and have a new finding in
> https://bugzilla.kernel.org/show_bug.cgi?id=212951#c13. It seems that maybe we
> could do something in the i915 driver during resume to handle the hpd (because we
> unplug the dock/dongle when
> suspended) at the very beginning. Since it involves HPD, PMC and the BIOS, I don't
> know which way I should go to fix this since Windows won't hit this problem.
How about https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs to get also our devs involved better.
>
> Please let me know if there's any information missing in the bugzilla.kernel ticket.
> Any suggestions would be appreciated. Thanks
>
> Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-05-21 7:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-04 8:08 NVIDIA GPU fallen off the bus after exiting s2idle Chris Chiu
2021-05-04 8:08 ` Chris Chiu
2021-05-06 9:46 ` Rafael J. Wysocki
2021-05-06 9:46 ` Rafael J. Wysocki
2021-05-21 4:02 ` [Intel-gfx] " Chris Chiu
2021-05-21 4:02 ` Chris Chiu
2021-05-21 4:02 ` Chris Chiu
2021-05-21 7:13 ` Saarinen, Jani [this message]
2021-05-21 7:13 ` [Intel-gfx] " Saarinen, Jani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1953f07d15db4fda8a40e5ca752bef96@intel.com \
--to=jani.saarinen@intel.com \
--cc=bhelgaas@google.com \
--cc=chris.chiu@canonical.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=kherbst@redhat.com \
--cc=len.brown@intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mika.westerberg@intel.com \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.