From: Takashi Iwai <tiwai@suse.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: alsa-devel@alsa-project.org, linux-pm@vger.kernel.org,
linux-pci@vger.kernel.org,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
linux-kernel@vger.kernel.org,
"Alex Xu \(Hello71\)" <alex_y_xu@yahoo.ca>,
Roy Spliet <nouveau@spliet.org>
Subject: Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
Date: Wed, 22 Apr 2020 23:25:04 +0200 [thread overview]
Message-ID: <s5h8sinxlfz.wl-tiwai@suse.de> (raw)
In-Reply-To: <20200422205028.GA223132@google.com>
On Wed, 22 Apr 2020 22:50:28 +0200,
Bjorn Helgaas wrote:
>
> [+cc Rafael, linux-pm]
>
> On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote:
> > With 5.7-rc2, after resuming from suspend to RAM, I get:
> >
> > [ 55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> > [ 55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [ 55.679410] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00100000/04400000
> > [ 55.679414] pcieport 0000:00:03.1: AER: [20] UnsupReq (First)
> > [ 55.679417] pcieport 0000:00:03.1: AER: TLP Header: 40000004 0a0000ff fffc0e80 00000000
> > [ 55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> > [ 55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> > [ 55.679455] pcieport 0000:00:03.1: AER: device recovery failed
>
> I'm not at all confident in my decoding skills, but I *think* the TLP
> header decodes to:
>
> Fmt 010b 3 DW header with data (32-bit address)
> Type 00000b MWr
> Length 0x4 4 DW = 16 bytes
> Requester ID 0x0a00 0a:00.0
> Byte enables 0xff
> Address 0xfffc0e80
>
> which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80,
> and the 00:03.1 Root Port reported that as an Unsupported Request.
> I don't know why that would be unless the address is invalid.
>
> Maybe that's supposed to be an MSI address? Maybe a complete dmesg or
> /proc/iomem would have a clue?
>
> I feel like this UR issue could be a PCI core issue or maybe some sort
> of misuse of PCI power management, but I can't seem to get traction on
> it.
>
> > Then the display freezes and the system basically falls apart (can't
> > even sudo reboot -f, need to use magic sysrq).
> >
> > I bisected this to "ALSA: hda: Skip controller resume if not needed".
> > Setting snd_hda_intel.power_save=0 resolves the issue.
>
> FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> controller resume if not needed"),
> https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> v5.7-rc2.
Yes, and I posted the fix patch right now:
https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de
The possible cause was the tricky resume code that both HD-audio
controller (the parent PCI device) and the codec devices used.
At least the patch above seems working for the reporter's machine.
Now we need a bit more testing before merging, but it looks promising,
so far.
thanks,
Takashi
WARNING: multiple messages have this Message-ID (diff)
From: Takashi Iwai <tiwai@suse.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>,
alsa-devel@alsa-project.org, Roy Spliet <nouveau@spliet.org>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
linux-pm@vger.kernel.org
Subject: Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
Date: Wed, 22 Apr 2020 23:25:04 +0200 [thread overview]
Message-ID: <s5h8sinxlfz.wl-tiwai@suse.de> (raw)
In-Reply-To: <20200422205028.GA223132@google.com>
On Wed, 22 Apr 2020 22:50:28 +0200,
Bjorn Helgaas wrote:
>
> [+cc Rafael, linux-pm]
>
> On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote:
> > With 5.7-rc2, after resuming from suspend to RAM, I get:
> >
> > [ 55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> > [ 55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [ 55.679410] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00100000/04400000
> > [ 55.679414] pcieport 0000:00:03.1: AER: [20] UnsupReq (First)
> > [ 55.679417] pcieport 0000:00:03.1: AER: TLP Header: 40000004 0a0000ff fffc0e80 00000000
> > [ 55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> > [ 55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> > [ 55.679455] pcieport 0000:00:03.1: AER: device recovery failed
>
> I'm not at all confident in my decoding skills, but I *think* the TLP
> header decodes to:
>
> Fmt 010b 3 DW header with data (32-bit address)
> Type 00000b MWr
> Length 0x4 4 DW = 16 bytes
> Requester ID 0x0a00 0a:00.0
> Byte enables 0xff
> Address 0xfffc0e80
>
> which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80,
> and the 00:03.1 Root Port reported that as an Unsupported Request.
> I don't know why that would be unless the address is invalid.
>
> Maybe that's supposed to be an MSI address? Maybe a complete dmesg or
> /proc/iomem would have a clue?
>
> I feel like this UR issue could be a PCI core issue or maybe some sort
> of misuse of PCI power management, but I can't seem to get traction on
> it.
>
> > Then the display freezes and the system basically falls apart (can't
> > even sudo reboot -f, need to use magic sysrq).
> >
> > I bisected this to "ALSA: hda: Skip controller resume if not needed".
> > Setting snd_hda_intel.power_save=0 resolves the issue.
>
> FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> controller resume if not needed"),
> https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> v5.7-rc2.
Yes, and I posted the fix patch right now:
https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de
The possible cause was the tricky resume code that both HD-audio
controller (the parent PCI device) and the codec devices used.
At least the patch above seems working for the reporter's machine.
Now we need a bit more testing before merging, but it looks promising,
so far.
thanks,
Takashi
next prev parent reply other threads:[~2020-04-22 21:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1587494585.7pihgq0z3i.none.ref@localhost>
2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71)
2020-04-21 19:08 ` Alex Xu (Hello71)
2020-04-21 19:40 ` Takashi Iwai
2020-04-21 19:40 ` Takashi Iwai
2020-04-22 20:50 ` Bjorn Helgaas
2020-04-22 20:50 ` Bjorn Helgaas
2020-04-22 21:25 ` Takashi Iwai [this message]
2020-04-22 21:25 ` Takashi Iwai
2020-04-22 23:21 ` Bjorn Helgaas
2020-04-22 23:21 ` Bjorn Helgaas
2020-04-23 7:05 ` Takashi Iwai
2020-04-23 7:05 ` Takashi Iwai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=s5h8sinxlfz.wl-tiwai@suse.de \
--to=tiwai@suse.de \
--cc=alex_y_xu@yahoo.ca \
--cc=alsa-devel@alsa-project.org \
--cc=helgaas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=nouveau@spliet.org \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.